*** marios is now known as marios|ruck | 05:16 | |
*** jpena|off is now known as jpena | 06:57 | |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: Add a meta log upload role with a failover mechanism https://review.opendev.org/c/zuul/zuul-jobs/+/795336 | 07:39 |
---|---|---|
*** bhagyashris_ is now known as bhagyashris | 09:40 | |
*** jpena is now known as jpena|lunch | 11:32 | |
*** jpena|lunch is now known as jpena | 12:29 | |
*** marios|ruck is now known as marios|ruck|call | 12:30 | |
pabelanger[m] | tobiash: if you have spare time, maybe your team and check out https://review.opendev.org/c/zuul/zuul/+/798131 | 12:40 |
tristanC | Restarting our main zuul scheduler now takes more than 15 minutes and it seems like the bottleneck is getting the branches information of repositories hosted on src.fedoraproject.org... | 12:43 |
tobiash[m] | pabelanger: do you know if the neutral status is available on all ghe versions? | 12:43 |
tristanC | Could we look into skipping that loading step for project with the include attribute set to []? Or is this going to be refactored with the rest of the scheduler ha? | 12:43 |
pabelanger[m] | I didn't see anything specific in https://docs.github.com/en/rest/reference/checks#update-a-check-run | 12:44 |
tobiash[m] | tristanC: I'm not aware about changes around this topic, do you think this is a regression or just something to be tweaked? | 12:44 |
tristanC | tobiash[m]: i am not sure if this is a regression, and/or if this is because we are adding more and more projects. But this is going to be a blocker as we would like to add a new tenant with double the amount of projects we currently have. | 12:50 |
tristanC | and since most of these projects do not include zuul configuration, i guess we could tweak the scheduler startup process by skipping the branch collection when include is set to [] | 12:51 |
tristanC | or perhaps this branch information should be stored in the ZooKeeper, so that multiple schedulers would not need to look it up when starting | 12:53 |
pabelanger[m] | I feel we had this issue before, about reloads taking a long time for projects zuul didn't really care about | 12:55 |
pabelanger[m] | in the early 3.0 days | 12:56 |
fungi | tristanC: not necessarily "refactored" but with the configuration state moving into zookeeper it will be persisted there between scheduler restarts | 13:35 |
fungi | so hopefully scheduler restarts will perform more like a smart-reconfigure | 13:37 |
corvus | some of the branch selection logic in the executor uses the list of branches on project even with include:[], so ignoring that wouldn't be straightforward (ie, it may not be correct in all cases). it might be best to wait until the scheduler cache is in place and see how that goes before looking into more specific optimizations | 13:42 |
tristanC | corvus: how far are we to have this cache? | 13:44 |
corvus | (it's partially in place now; we just still reload everything on startup) | 13:44 |
corvus | tristanC: do you know what part of the process takes so long? is there something in one of the drivers we could make less expensive? | 13:45 |
corvus | (because collecting the branches on all of opendev's zuul takes only a few seconds) | 13:45 |
clarkb | tristanC: tobiash[m]: I know for Gerrit they say that git protocol v2 really helps. I don't think src.fedoraproject.org is running gerrit but maybe that is something to try toggling on there too? | 13:46 |
clarkb | it may just be a client side thing (we did that in the zuul published images already though if you use those) | 13:47 |
corvus | RUN git config --system protocol.version 2 | 13:48 |
corvus | https://review.opendev.org/766765 for more info | 13:48 |
fungi | though it's worth keeping in mind that while opendev does have a lot of projects we culturally (mostly due to the gerrit workflow) don't have the proliferation of thousands of topic branches you see on some other projects in the wild | 14:07 |
corvus | yep, just wondering what's eating the time? does the problem scale with number of projects or number of branches? | 14:09 |
tristanC | corvus: it seems to scale with the number of projects, and we guess this is cause by src.fedoraproject.org being slow | 14:12 |
fungi | and that's pagure, right? | 14:13 |
tristanC | fungi: yes, with >30k projects | 14:14 |
*** marios|ruck|call is now known as marios|ruck | 14:15 | |
corvus | if so, it uses the api to get branches; another option might be to use git | 14:15 |
tristanC | we would like to add all those project to zuul, but the scheduler is currently struggling to load 1k projects in reasonable amount of time | 14:15 |
clarkb | pabelanger[m]: tobiash[m] for neutral and skipped statuses do we need GHE to handle them to land the change? Or is it only a problem if the user sets that status in their config and their github doesn't support it? | 14:39 |
opendevreview | Merged zuul/zuul master: Increase unit test job timeout to 90 minutes https://review.opendev.org/c/zuul/zuul/+/798019 | 14:51 |
pabelanger[m] | clarkb: I am not sure, I would imagine if GHE didn't support it, they would raise an error from the API. | 14:56 |
pabelanger[m] | however, I don't see any dates or API release notes to say when it was added | 14:57 |
clarkb | pabelanger[m]: looks like you cna select the github api version in the top right of the docs page. It defaults to github.com | 15:00 |
clarkb | and while I see GHE 3.1 and 3.0 list neutral I don't see skipped on any of the version pages | 15:00 |
clarkb | I suspect it is fine https://docs.github.com/en/enterprise-server@3.1/rest/guides/getting-started-with-the-checks-api | 15:01 |
clarkb | corvus: is https://review.opendev.org/c/zuul/zuul/+/797180/2 desireable if testing shows it is slightly slower than using alembic? Maybe we want tokeep the two in sync anyway and this is a good way to ensure that? | 15:04 |
corvus | clarkb: my thinking is yes -- it's not slow enough to make a difference, and the functionality is an improvement | 15:08 |
corvus | (just getting them in sync already caught one minor error) | 15:08 |
pabelanger[m] | clarkb: I can see it in 2.21 https://docs.github.com/en/enterprise-server@2.21/rest/reference/checks#update-a-check-run | 15:24 |
clarkb | pabelanger[m]: ya I sspect your change is fine given those docs | 15:24 |
pabelanger[m] | so, think we are safe | 15:24 |
clarkb | corvus: do we also need to test postgres the same way? | 15:28 |
corvus | clarkb: not a bad idea; i can work on a change for that | 15:30 |
corvus | (it should be the same, but having an extra test could catch an unlikely edge case) | 15:31 |
clarkb | ya and we may have different indexes or types with postgres | 15:32 |
clarkb | corvus: in https://review.opendev.org/c/zuul/zuul/+/797181/1/tests/base.py should we just go ahead and delete that commented out code? Or do you want to do that when you update all of the test cases to use the new parent class? | 15:38 |
corvus | clarkb: that should probably be deleted from that change | 15:41 |
clarkb | also what is the difference betwene the sql driver and the database driver? We seem to hanlde both driver setups | 15:46 |
clarkb | we made the canonical config "database" right? | 15:46 |
clarkb | ya if database then driver = self.drivers['sql'] | 15:47 |
clarkb | left another note on that change | 15:50 |
corvus | clarkb: yeah, sql is deprecated, to be removed with 5.0 | 15:58 |
clarkb | fungi: corvus is https://review.opendev.org/c/zuul/zuul/+/798092 related to that zuul-jobs update mhu1 was writing? if so I suspect that is a priority even if it has conflicts iwth many changes | 16:00 |
*** jpena is now known as jpena|off | 16:00 | |
corvus | heh, i only see it conflicting with 1 change; but yes, that is related to mhu1's change -- we can revert mhu's change after we make a release with 798092 in it | 16:01 |
fungi | clarkb: yes, it's the longer-term solution which the zuul-jobs change is attempting to provide a temporary workaround for | 16:01 |
fungi | which i was also able to get working over the weekend if anyone has time to look at it | 16:02 |
clarkb | corvus: gerrit web ui shows ~40? conflicting changes | 16:02 |
corvus | clarkb: impressive! gertty shows it only conflicting with https://review.opendev.org/664965 | 16:03 |
corvus | (with which, btw, it truly does conflict -- i just checked) | 16:04 |
fungi | i think gerrit considers two changes to "conflict" if they both edit the same file | 16:04 |
corvus | anyway, i agree, it's a priority | 16:04 |
clarkb | that change would write out the zuul resources dict to that file via zuul_vars.update({'resources': zuul_resources}) | 16:06 |
clarkb | we don't put any secret connection details in there thatwe might accidentally expose? This would be for container workloads I think (since winrm and ssh rely on external credentials) | 16:06 |
opendevreview | James E. Blair proposed zuul/zuul master: Replace TreeCache in component registry https://review.opendev.org/c/zuul/zuul/+/796582 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Add ExecutorApi https://review.opendev.org/c/zuul/zuul/+/770902 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Change zone handling in ExecutorApi https://review.opendev.org/c/zuul/zuul/+/787833 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Switch to string constants in BuildRequest https://review.opendev.org/c/zuul/zuul/+/791849 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Clean up Executor API build request locking and add tests https://review.opendev.org/c/zuul/zuul/+/788624 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Fix race with watches in ExecutorAPI https://review.opendev.org/c/zuul/zuul/+/792300 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Execute builds via ZooKeeper https://review.opendev.org/c/zuul/zuul/+/788988 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Move build request cleanup from executor to scheduler https://review.opendev.org/c/zuul/zuul/+/794687 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Handle errors in the executor main loop https://review.opendev.org/c/zuul/zuul/+/796583 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Shard BuildRequest parameters https://review.opendev.org/c/zuul/zuul/+/797149 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Compress sharded ZK data https://review.opendev.org/c/zuul/zuul/+/797156 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Switch to ZooKeeper backed merge result events https://review.opendev.org/c/zuul/zuul/+/784195 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Switch from global to tenant event queues https://review.opendev.org/c/zuul/zuul/+/797440 | 16:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove unused addResultEvent method https://review.opendev.org/c/zuul/zuul/+/797542 | 16:07 |
corvus | clarkb: hrm, it probably does have connection information... | 16:07 |
corvus | clarkb: it's possible we "leak" that now in the same way that we leaked the buildset registry info before | 16:08 |
clarkb | hrm | 16:08 |
corvus | (ie, it's only available after the job finishes, so it doesn't matter) | 16:08 |
*** marios|ruck is now known as marios|out | 16:09 | |
corvus | but considering what we learned about the buildset registry, we should think about whether that's actually true | 16:09 |
clarkb | ya I was trying to think if there would be a case where it matters. In theory the creds are only used while the job is running and once the job is done it would no longer become a problem | 16:09 |
clarkb | exceptions would be if we returned those credentials while the job was running (via console log or zuul_return) or if we reused credentials between jobs | 16:10 |
corvus | zuul_return won't expose those until the end of the job though | 16:11 |
clarkb | ah yup, they would have to then be used by another job consuming the return somehow | 16:11 |
corvus | i think my inclination right now is to say that it's okay because it doesn't escape the executor until the end of the job. and that in general, the "inventory" part of that data is useful debugging info. but if we wanted to tighten it up a bit, we could remove the credentials from there when writing the debug inventory.yaml file; but that would basically be special-casing some stuff. | 16:12 |
clarkb | maybe tristanC and tobiash[m] want to weigh in since they use systems with credetnials in there? (I don't think opendev really does) | 16:13 |
clarkb | I'll +2 the change and they can approve it if they are ok wtih it | 16:14 |
corvus | (and to be clear: it's not a regression -- but it is an opportunity to rethink something that has been happening for a while) | 16:16 |
clarkb | ++ | 16:16 |
clarkb | corvus: https://review.opendev.org/c/zuul/zuul/+/795195 ianw reports that firefox 88 exhibits the issue with xterm.js text highlights | 16:16 |
clarkb | I suspect that maybe there is an interaction between xterm.js and $browser $version. Probably worth updating to ensure it is happy regradless of browser | 16:17 |
corvus | clarkb: yep -- do we know if anyone's rechecked the behavior in the build? | 16:17 |
clarkb | corvus: using the preview site? I haven't. But I can do that now | 16:18 |
clarkb | I don't have unhappy FF though | 16:18 |
corvus | oh interesting, the scrollbar is back to being unhappy for me, but i also don't have a selection issue either way. | 16:19 |
corvus | anyway, change lgtm | 16:19 |
clarkb | corvus: on FF 90.0b12 the highlighting works for me using the site preview | 16:19 |
clarkb | The scrollbar seems to work | 16:19 |
corvus | my thought process was: something changed in production, better double check the change | 16:20 |
clarkb | ++ | 16:20 |
clarkb | I suspect the production hcnage was we all get newer FF installs | 16:20 |
corvus | ya | 16:20 |
corvus | (and that also could affect the change :) | 16:20 |
opendevreview | James E. Blair proposed zuul/zuul master: Use an actual database in all Zuul tests https://review.opendev.org/c/zuul/zuul/+/797181 | 16:22 |
avass[m] | corvus, clarkb: do you mean the missing zuul.* vars and kubernetes/openshift credentials? or do other drivers use credentials? | 16:23 |
clarkb | avass[m]: I think this would be largely limited to drivers that don't use key auth (so ya k8s and openshift) | 16:23 |
avass[m] | so that | 16:23 |
corvus | yeah, it's basically drivers that don't end up putting stuff in the ansible inventory | 16:24 |
avass[m] | that's also specifically the namespace drivers I suppose | 16:24 |
corvus | clarkb: https://review.opendev.org/796582 is the next v5 thing, btw; i think it's ready to go | 16:26 |
clarkb | corvus: ok I'll take a look | 16:26 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase https://review.opendev.org/c/zuul/zuul/+/798350 | 16:29 |
opendevreview | James E. Blair proposed zuul/zuul master: Use an actual database in all Zuul tests https://review.opendev.org/c/zuul/zuul/+/797181 | 17:31 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase https://review.opendev.org/c/zuul/zuul/+/798350 | 17:31 |
opendevreview | James E. Blair proposed zuul/zuul master: Report intermediate buildsets and builds https://review.opendev.org/c/zuul/zuul/+/797182 | 17:31 |
opendevreview | James E. Blair proposed zuul/zuul master: Add nodeset to build table https://review.opendev.org/c/zuul/zuul/+/798207 | 17:31 |
opendevreview | James E. Blair proposed zuul/zuul master: Add Postgres schema test https://review.opendev.org/c/zuul/zuul/+/798362 | 17:31 |
corvus | clarkb: ^ i think that's your comments addressed | 17:31 |
corvus | took a bit to figure out a workable (i hope?) solution for postgres schema comparison | 17:32 |
clarkb | corvus: thanks. I'm still working through the TreeCache replacement. It isn't a quick review | 17:32 |
corvus | clarkb: oh sorry :( | 17:32 |
clarkb | no worries, I just like to think about the state as I'm going through and should probably drag out pen and paper but doing it in my head instead | 17:33 |
clarkb | I think I'm almost through it now though. | 17:34 |
corvus | clarkb: the good news is that the executor api behaves in exactly the same way (it's several changes down the stack); so it's a pattern we'll be establishing. the executor api came first; then we found out the component registry had bugs that kept failing changes further down the stack. | 17:35 |
opendevreview | Merged zuul/zuul master: web: update to latest xterm; disabled selection overrides https://review.opendev.org/c/zuul/zuul/+/795195 | 17:45 |
clarkb | corvus: I left a couple of questions on the TreeCache replacement. Can you take a look at those? | 17:47 |
clarkb | also huh does postgres not do the show create table stuff? | 17:50 |
corvus | nope | 17:50 |
corvus | i mean, "nope" to question #2. question #1 is "yes i'm looking now" :) | 17:51 |
corvus | clarkb: replied | 17:54 |
clarkb | wfm | 17:55 |
fungi | "tell em yes on one and no on two" - buckaroo banzai | 17:55 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase https://review.opendev.org/c/zuul/zuul/+/798350 | 19:33 |
opendevreview | James E. Blair proposed zuul/zuul master: Report intermediate buildsets and builds https://review.opendev.org/c/zuul/zuul/+/797182 | 19:33 |
opendevreview | James E. Blair proposed zuul/zuul master: Add nodeset to build table https://review.opendev.org/c/zuul/zuul/+/798207 | 19:33 |
clarkb | if anyone has review time https://review.opendev.org/c/zuul/nodepool/+/793508 is the last change in hte azure driver rewrite stack. A pretty easy cleanup fix | 19:53 |
clarkb | corvus: https://review.opendev.org/c/zuul/nodepool/+/785821 is an accounting change in nodepool that you may want to look at. I think I didn't approve it due to the zuul 4.6.0 work and wanting to avoid landing any problems concurrent with that | 19:54 |
clarkb | corvus: its parent too | 19:54 |
clarkb | tristanC: I have approved https://review.opendev.org/c/zuul/nodepool/+/777022 with one minor nit noted. I don't think its a big deal. more of a code simplification | 20:02 |
corvus | clarkb, ianw, fungi, tobiash: re 785821 -- left a q on there | 20:16 |
corvus | i left a +1 on that to indicate i think the change is probably okay, but maybe if we do the other thing we can delete a bunch of code and not have to think about changes like that :) | 20:18 |
clarkb | corvus: I think if we do that we have the launch failures due to being over quota when we think we are under? The provider will then go into pause mode until it frees space so it isn't fatal but can be noisy in logs iirc | 20:18 |
clarkb | I think the idea behind the code there is to include leaked instances in quota calculations to avoid that if possible | 20:20 |
corvus | clarkb: as long as the provider counts the leaked nodes as having reduced quota, the effect should be the same -- though it's true that we might bump into a quota error before we discover it. should just happen once at most though. | 20:20 |
corvus | alternatively, we could have the leak detector update the quota. | 20:20 |
corvus | i think this will happen if/when we port openstack to the state machine driver framework anyway | 20:21 |
clarkb | ya I think this approach is one way to have the provider count the leaked nodes as reducing quota | 20:21 |
corvus | clarkb: no sorry i meant something really specific by that | 20:22 |
clarkb | https://review.opendev.org/c/zuul/nodepool/+/795642 is an unrelated change that should be approvable now (the functional openshift job just succeeded in the check queue for it) | 20:23 |
clarkb | if anyone wants to be a quick second review on it | 20:23 |
corvus | the provider keeps track of quota that is reduced by nodes it doesn't know about; by "remembering" these nodes, they account for quota that we do know about. if we "forget" them, then they should appear in quota of nodes we don't know about | 20:23 |
corvus | every driver other than openstack does the second thing, and is simpler. openstack does the first thing and is complex. | 20:23 |
clarkb | got it | 20:24 |
clarkb | maybe we land these changes since they improve the current driver for those of us that run from master in particular. Then plan to switch when it gets rewritten to the state machine driver? | 20:25 |
clarkb | (sound slike that would just happen for free) | 20:25 |
corvus | well, i dunno if i'd say "for free" :) | 20:26 |
corvus | i don't know of anyone currently signed up to rewrite the openstack driver using state machine atm | 20:27 |
clarkb | well as a necessary part of any rewrite I guess (the rewrite itself isn't free) | 20:27 |
corvus | so my personal preference would be to incrementally make it better and move toward it, which is why i made that suggestion | 20:27 |
clarkb | I see | 20:27 |
clarkb | I'll let ianw weigh in then as the original author | 20:28 |
corvus | yeah, i'm not opposed; just that if presented the choice of "add more complexity" and "delete code" i'd go with 2. | 20:29 |
clarkb | does the openstack dirver have handling for the unmanaged quota? | 20:29 |
corvus | yep | 20:29 |
clarkb | I assume so if it is common to the other drivers since they inherited a bunch of openstack stuff | 20:29 |
clarkb | ok cool | 20:29 |
corvus | clarkb: can you review https://review.opendev.org/798374 real quick-like? should unblock gating.dev dns | 20:32 |
clarkb | done | 20:34 |
opendevreview | Merged zuul/nodepool master: kubernetes: refactor client creation to utils_k8s https://review.opendev.org/c/zuul/nodepool/+/777022 | 21:16 |
corvus | clarkb: if you have some brainspace left, this is an interesting non-change that i think could use your review: https://review.opendev.org/794050 | 21:37 |
opendevreview | Merged zuul/nodepool master: QuotaInformation : abstract resource recording https://review.opendev.org/c/zuul/nodepool/+/787093 | 21:46 |
fungi | hopefully he's not passed out from heatstroke | 21:47 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase https://review.opendev.org/c/zuul/zuul/+/798350 | 21:59 |
opendevreview | James E. Blair proposed zuul/zuul master: Report intermediate buildsets and builds https://review.opendev.org/c/zuul/zuul/+/797182 | 21:59 |
opendevreview | James E. Blair proposed zuul/zuul master: Add nodeset to build table https://review.opendev.org/c/zuul/zuul/+/798207 | 21:59 |
SpamapS | Hey all. Just wanted to say, well written security messaging today. Sorry for the bugs in 3! ;) | 22:59 |
fungi | SpamapS: ooh, did you see the zuul-announce mailing list post first, or the twitters? | 23:08 |
fungi | we asked foundation comms folks to spread the word on social media today in case folks missed the ml announcement thursday | 23:09 |
corvus | maybe we need to bounce those messages to -discuss too? | 23:12 |
corvus | spamaps: also, you have nothing to apologize for; you gave us bwrap which has basically saved our skin on every other security issue so far, so thanks! :) | 23:15 |
corvus | and thanks for the kudos on the msg :) it's nice to know it's comprehensible | 23:16 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase https://review.opendev.org/c/zuul/zuul/+/798350 | 23:18 |
opendevreview | James E. Blair proposed zuul/zuul master: Report intermediate buildsets and builds https://review.opendev.org/c/zuul/zuul/+/797182 | 23:18 |
opendevreview | James E. Blair proposed zuul/zuul master: Add nodeset to build table https://review.opendev.org/c/zuul/zuul/+/798207 | 23:18 |
SpamapS | <corvus "spamaps: also, you have nothing "> aww shucks. :) Glad it's been handy. It was a great idea, the origin of which I don't recall. :) | 23:34 |
SpamapS | <corvus "and thanks for the kudos on the "> Yeah I just liked how clear and matter-of-fact it was. :) I saw it on the Twitters yes. | 23:34 |
SpamapS | sadly, I rarely get around to reading -discuss until the weekend. | 23:35 |
fungi | SpamapS: that's great feedback, it'll help me remember to get the openinfra social media crew involved sooner in future cases | 23:36 |
fungi | by the time i was caught up on opendev's upgrade and the related config changes/fixes, thursday was gone, and friday's not a good day to announce things if you want people to actually notice | 23:37 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!