Thursday, 2021-04-08

openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503900:33
*** josefwells has joined #zuul01:10
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503901:14
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Bump API version to v1alpha2  https://review.opendev.org/c/zuul/zuul-operator/+/78504701:14
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support externally managed Zookeeper and DB  https://review.opendev.org/c/zuul/zuul-operator/+/78527301:14
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Pass through extra scheduler config options  https://review.opendev.org/c/zuul/zuul-operator/+/78527701:14
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Add merger support  https://review.opendev.org/c/zuul/zuul-operator/+/78527801:14
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support imagePrefix and versions  https://review.opendev.org/c/zuul/zuul-operator/+/78527901:14
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: docs  https://review.opendev.org/c/zuul/zuul-operator/+/78508301:14
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support fingergw  https://review.opendev.org/c/zuul/zuul-operator/+/78530001:14
*** hamalq has quit IRC01:21
*** spotz has quit IRC01:32
*** josefwells has quit IRC01:58
*** rlandy|rover|bbl is now known as rlandy|rover02:17
*** avass has quit IRC02:22
*** rlandy|rover has quit IRC02:32
*** evrardjp has quit IRC02:33
*** evrardjp has joined #zuul02:33
*** sam_wan has joined #zuul02:55
*** sam_wan has quit IRC02:56
*** sam_wan has joined #zuul03:24
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503903:48
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Bump API version to v1alpha2  https://review.opendev.org/c/zuul/zuul-operator/+/78504703:48
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support externally managed Zookeeper and DB  https://review.opendev.org/c/zuul/zuul-operator/+/78527303:48
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Pass through extra scheduler config options  https://review.opendev.org/c/zuul/zuul-operator/+/78527703:48
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Add merger support  https://review.opendev.org/c/zuul/zuul-operator/+/78527803:48
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support imagePrefix and versions  https://review.opendev.org/c/zuul/zuul-operator/+/78527903:48
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support fingergw  https://review.opendev.org/c/zuul/zuul-operator/+/78530003:48
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: docs  https://review.opendev.org/c/zuul/zuul-operator/+/78508303:48
*** ykarel|away has joined #zuul03:54
*** vishalmanchanda has joined #zuul04:15
*** saneax has joined #zuul04:20
*** paladox has quit IRC04:27
*** saneax has quit IRC04:31
*** paladox has joined #zuul04:33
*** paladox has quit IRC04:43
*** paladox has joined #zuul04:45
*** saneax has joined #zuul04:51
*** jfoufas1 has joined #zuul04:57
openstackgerritSimon Westphahl proposed zuul/zuul master: Stop active event gathering on connection loss  https://review.opendev.org/c/zuul/zuul/+/78510006:16
openstackgerritTobias Henkel proposed zuul/zuul master: Fix missing repo state restore  https://review.opendev.org/c/zuul/zuul/+/78531006:42
*** ykarel|away is now known as ykarel06:43
tobiashclarkb: this is a more generic attempt to fix the repo state restore ^06:43
*** tosky has joined #zuul06:45
*** reiterative has quit IRC06:49
*** reiterative has joined #zuul06:49
*** jpena|off is now known as jpena06:50
*** avass has joined #zuul06:57
*** rpittau|afk is now known as rpittau07:01
avassAre there still problems withs arm nodes? zuul/zuul-jobs gate is currently blocked because of it07:03
*** jcapitao has joined #zuul07:04
*** saneax has quit IRC07:17
*** jcapitao has quit IRC07:19
*** jcapitao has joined #zuul07:21
*** saneax has joined #zuul07:38
*** ykarel_ has joined #zuul08:00
*** ykarel has quit IRC08:02
*** ykarel_ is now known as ykarel08:02
openstackgerritMerged zuul/zuul master: Gitlab: raise MergeFailure exception to retry a failing merge  https://review.opendev.org/c/zuul/zuul/+/77716908:11
openstackgerritMerged zuul/zuul master: Add messages to make the job setup more transparent  https://review.opendev.org/c/zuul/zuul/+/77788508:13
openstackgerritTobias Henkel proposed zuul/zuul master: Fix missing repo state restore  https://review.opendev.org/c/zuul/zuul/+/78531008:13
openstackgerritMerged zuul/nodepool master: Add simple load testing script  https://review.opendev.org/c/zuul/nodepool/+/77584308:18
*** ykarel is now known as ykarel|lunch08:23
openstackgerritIan Wienand proposed zuul/nodepool master: Require dib 3.9.0  https://review.opendev.org/c/zuul/nodepool/+/78534709:12
*** ykarel|lunch is now known as ykarel09:22
*** sshnaidm|afk is now known as sshnaidm09:31
*** jcapitao has quit IRC09:46
iceyn10:31
*** sshnaidm has quit IRC11:01
*** sshnaidm has joined #zuul11:04
*** sam_wan has quit IRC11:08
*** jpena is now known as jpena|lunch11:30
*** sshnaidm has quit IRC11:37
*** rlandy has joined #zuul11:43
*** rlandy is now known as rlandy|rover11:43
*** pots has quit IRC11:49
*** pots has joined #zuul11:50
*** sshnaidm has joined #zuul11:50
*** jcapitao has joined #zuul12:07
*** jpena|lunch is now known as jpena12:31
zbrtobiash: tristanC: fungi: https://review.opendev.org/c/zuul/zuul/+/766460 is finally green, improved dev guide page: https://d1c94900ec6853cf329a-2ce6583bcfee959f0e7ee40d82e3f479.ssl.cf2.rackcdn.com/766460/26/check/zuul-tox-docs/6ed9304/docs/reference/developer/index.html?highlight=developer12:58
*** sanjayu_ has joined #zuul13:01
tristanCzbr: thanks!13:03
*** saneax has quit IRC13:03
*** sanjayu__ has joined #zuul13:16
*** Goneri has joined #zuul13:18
*** sanjayu_ has quit IRC13:18
zbrtristanC: it took 10x more time than I was expecting for such a minor docs improvement.13:26
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503913:43
openstackgerritJames E. Blair proposed zuul/zuul master: Add a checkpoint release note  https://review.opendev.org/c/zuul/zuul/+/78505413:53
tobiashclarkb, corvus: the repo state fix revealed a conceptual conflict between buildset-global repo state and job-refreezing during tenant reconfigurations14:00
corvustobiash: because a reconfig could add a project to required-projects?14:01
tobiashyes, or just a job that needs a new playbook14:02
tobiashso the goal of buildset-global repo state is to make all jobs within a buildset consistent so all use the same repo states14:03
tobiasha refreeze during reconfig basically can alter the job config of all jobs within a buildset that have not yet been started14:05
tobiashthinking about this I think the refreeze during a reconfig also breaks the consistency within a buildset14:06
corvustobiash: that's correct; we accepted it as a sort of best effort.  unlike a strict gating sequence, there's really no synthetic point in time where we can say that a new configuration should apply.  so we just apply it asap, on new jobs, and we don't try to apply it retroactively on already running or complete jobs (ie, we don't abort/re-run them)14:08
tobiashcorvus: I think one way to fix this conflict and keep the consistency could be to only refreeze buildsets which have no running jobs yet14:09
tobiashif we don't accept the less asap way another option could be to re-process the merge and repo state generation and use the new repo state for the new jobs, but that breaks the goal of global repo state in case of reconfigs14:12
tobiasha side effect of the no-refreeze-started-buildsets approach would be a much faster reconfig14:14
corvustobiash: i don't think there's a clearly right/wrong answer here and would be comfortable with either.  i think option 1 is probably a little better because it does try to maintain the global state.  it might be good to store a flag on items which are running with a prior configuration so if we wanted, we could display that in the ui.  the biggest advantage of the asap approach is being able to fix something14:16
corvusexternal to the gate chain and have it apply quickly.  like, if zuul ran tempest but isn't in tempest's gate queue, a fix to tempest would take effect sooner.  having that flag in the ui could help people decide to dequeue buildsets running on the old config.14:16
corvustobiash: iow, probably picking one, documenting it, and making it visible/discoverable is the best thing we can do in an ambiguous situation like this :|14:17
zbrcorvus: re TESTING.rst, afaik that page is not part of the built docs. not sure how to fix that.14:26
zbrthat one is static, the other one is dynamic. we could move that onde inside the development one and remove this file.14:28
corvuszbr: that works for me, thanks14:29
corvusi think the main thing to avoid is having two sources of truth14:29
zbri agree with that, i will also make few small fixes like replacing py35 with just py on it14:30
corvus++14:30
corvustobiash: so is this the repo update problem that clarkb saw?14:30
tobiashcorvus: well, the current global repo state implementation is broken which creates that problem as a side effect14:31
*** ykarel has quit IRC14:32
corvustobiash: 785310 fixes that?14:32
*** ykarel has joined #zuul14:32
tobiashcorvus: yes14:33
tobiashthe problem was that the global repo state was created, then the isupdateof thinks no update is needed but then the restore of the repo state is missing14:34
tobiashwhich basically leads to a potentially wrong commit checked out14:34
tobiashbut 785310 doesn't pass tests that test the reconfig14:35
corvustobiash: okay, so 2 related issues: a regression which is theoretically fixed by 785310, and the newly discovered/discussed inconsistency in reconfiguration, which doesn't have a patchset yet (which isn't so much a regression as a slight undermining of the intent of global state).  we probably need the first in 4.2.0, but the second can wait a bit if necessary?14:35
corvusoh14:37
tobiashthe problem is that 785310 requires a fix for the inconsistency in reconfig14:37
corvustobiash: we might need the second to fix the tests on the first :)14:37
corvusya that :)14:37
corvusi'm waking up :)14:37
tobiashanother option would be to revert global repo state and re-merge it when we have a fix for the reconfig inconsistency14:38
tobiashthat wouldn't block the release then14:38
openstackgerritTobias Henkel proposed zuul/zuul master: Revert "Make repo state buildset global"  https://review.opendev.org/c/zuul/zuul/+/78542714:43
tobiashcorvus: revert of the repo state in case we want to revert until we have the reconfig change ^14:43
zbrI observed an annoying behavior on ubuntu where it prompts me with: correct 'docs' to 'doc' [nyae]? n -- when I run tox -e docs14:45
zbrthis is because the docs folder is called doc but the tox command is docs.14:45
zbrideally we should be consistent and avoid confusing the tooling14:46
fungiavass: we identified a recent regression in diskimage-builder 3.7.1 related to the introduction of secure boot support for centos 8.x, which broke the efi config and hence the ability of us to boot our centos-8-arm64 and centos-stream-8-arm64 images for the past ~week14:52
fungidiskimage-builder 3.9.0 has the fix and we're in the process of trying to get working images built in nodepool again for those14:52
fungiunfortunately nobody spotted that the nodes weren't booting at the end of last week, so by the time it was brought to our attention on saturday we no longer had any bootable images for those14:53
fungias i discovered after trying to roll back to the previous image, which was broken in the same way14:54
*** ykarel has quit IRC14:57
*** spotz has joined #zuul14:58
openstackgerritSorin Sbârnea proposed zuul/zuul master: Move testing doc into sphinx doc  https://review.opendev.org/c/zuul/zuul/+/78543015:00
corvusavass, fungi: we don't use arm nodes in the zuul-jobs gate do we?15:01
corvusoh apparently we do15:02
corvusbut we shouldn't15:02
clarkbI think there are jobs that try to cover a couple of arm specific things?15:03
corvusclarkb: one job for one specific thing: https://review.opendev.org/74624515:03
clarkbtobiash: did you have a chance to see my comments on https://review.opendev.org/c/zuul/zuul/+/785152 ? I'll look at the alternative shortly15:03
corvusavass, fungi, clarkb, ianw: the arm resource pool isn't really robust enough for us to add jobs to check/gate; if we really want to run that, we should probably add a second check pipeline15:04
corvusavass: i'd be in favor of disabling that job for now (and when/if we re-enable it, put it in another pipeline)15:05
clarkb++ the second check pipeline seems to work well for system-config15:05
fungii concur15:05
fungiright now we literally have only one provider for those nodes15:05
fungia second would be most welcome15:05
clarkbwe have a potential second provider that reached out just as my day ended yesterday15:06
fungithe recent dib regression is somewhat related to that lack of robustness. we don't want to gate dib changes on arm jobs so don't test that they're able to boot15:07
clarkbkevinz seems to have sent them to ianw and myself, will say more when I know more :)15:07
corvuscool; to be clear (since i'm about to propose a change to remove them) i think the arm64 jobs are worthwhile and would love to have them either in a second pipeline, or when things are robust enough (2 providers: yay!) to have them back in check :)15:08
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Remove arm64 jobs (temporarily)  https://review.opendev.org/c/zuul/zuul-jobs/+/78543215:09
corvusavass, fungi, clarkb: ^15:10
corvustobiash: i'll see if i can cobble together a change to the reconfiguration as we discussed; if we can get that together by the end of the day, then it's probably worth doing that; otherwise, let's take the revert.  so hopefully either way we can restart tomorrow.15:11
corvusclarkb: ^ for when you're caught up :)15:11
clarkbcorvus: that plan makes sense to me The reconfiguration chnage would be in addition to https://review.opendev.org/c/zuul/zuul/+/785310 ? Note this change isn't passing tests yet15:12
clarkbOnce I've got some tea and breakfast I'm going to try and review that one15:14
corvusclarkb: it'll have to be included -- the tests are failing because they need the reconfiguration change15:15
clarkbgot it15:18
openstackgerritMerged zuul/zuul master: Stop active event gathering on connection loss  https://review.opendev.org/c/zuul/zuul/+/78510015:34
*** sanjayu__ has quit IRC15:52
*** rpittau is now known as rpittau|bbl15:57
*** jpena is now known as jpena|off15:59
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503916:31
*** hamalq has joined #zuul16:33
*** vishalmanchanda has quit IRC16:34
clarkbcorvus: tobiash: if I'm reading this change correctly essentailly the way we're proposing things work is that isUpdateNeeded() will always ensure the necessary revs are present in the repo then later we run _restoreRepoState() with repo_state that says project foo branch bar is at rev xyz16:39
*** jcapitao has quit IRC16:39
clarkband that step will update branch bar to point at rev xyz if it doesn't already16:39
clarkband I guess that repo_state will always know about all the tags and heads and such?16:39
tobiashyes16:39
clarkbthanks, in that case these chagnes seem like they should work (once updated to fix the other issue that was exposed)16:41
*** sshnaidm has quit IRC16:43
corvusi'm a little confused about the two different repo states in the executor16:44
*** sshnaidm has joined #zuul16:44
openstackgerritMerged zuul/nodepool master: Require dib 3.9.0  https://review.opendev.org/c/zuul/nodepool/+/78534716:50
*** sshnaidm is now known as sshnaidm|afk17:10
corvustobiash: can you look at my comment on that change?  (cc clarkb)17:15
corvusif i'm right, then that bit of code is unecessary (but extra safe so maybe we want to keep it anyway but change the comment?)  if i'm wrong, then i'm missing something and would like to fully understand.  :)17:17
tobiashcorvus: as far as I've understood the mergeItem it updates the merged branch within the repo state17:19
tobiashthat's why I wanted to be sure that we use an original repo state for trusted repos later17:19
corvustobiash: i don't see how it can do that; nothing touches repo_state in _mergeItem after the merge17:21
tobiashthen I'm confused why it should update it17:21
corvustobiash: you saw that happen?17:21
corvusor you're confused about why we call saverepostate?17:22
tobiashhttps://opendev.org/zuul/zuul/src/branch/master/zuul/merger/merger.py#L95417:23
tobiashI'm confused about that17:23
tobiashthat clearly pretends to mutate the repo state17:23
tobiashor is that a noop within the executor?17:23
corvustobiash: the process is slightly different depending on whether it's happening on enqueue or execution17:24
corvustobiash: yep that's the idea17:24
tobiashah ok17:24
tobiashthen I've misunderstood that17:24
corvustobiash: on enqueue, it starts empty and populates as we merge each item17:24
tobiashbased on that line I wanted to be safe ;)17:24
corvustobiash: on the executor, it starts fully populated and shouldn't change17:24
tobiashk, then you're right and we don't have to copy17:24
corvusok.  you could talk me into keeping the copy as extra protection against a future bug that could cause us to update the trusted repos, as long as we change the comment to say that :)17:25
corvustobiash: and thank you very much for putting the comment you did put there, that helped me realize the discrepancy :)17:26
corvusthis is tricky stuff17:27
tobiashactually I don't mind whether I remove that or change the comment, I leave the decision to you17:27
corvusokay, i'm starting work on the refreeze part of this, i'll pick one :)17:28
tobiash++17:30
avasscorvus, fungi, clarkb: wanna promote 785432 in zuul-jobs gate so the changes can start merging? :)17:33
fungiavass: sure, on it17:35
avassthanks!17:36
fungiand done17:37
fungi#status log Promoted 785432,1 in the zuul tenant's gate pipeline due to indefinitely waiting builds ahead of it17:40
openstackstatusfungi: finished logging17:40
*** rpittau|bbl is now known as rpittau17:44
openstackgerritMerged zuul/zuul-jobs master: Remove arm64 jobs (temporarily)  https://review.opendev.org/c/zuul/zuul-jobs/+/78543217:46
fungiyeah, basically we have the following nodes in linaro-us: two debian-buster-arm64 nodes in-use for over an hour, two ubuntu-focal-arm64 nodes in use for half an hour, five debian-buster-arm64 nodes deleting for the past few minutes, and a ready ubuntu-focal-arm64-xxxlarge node nothing's used for 8.5 hours yet17:49
fungier, sorry, that was for #opendev17:49
openstackgerritMerged zuul/zuul-jobs master: Add upload-logs-azure role  https://review.opendev.org/c/zuul/zuul-jobs/+/78200417:50
corvustobiash: if we go with the approach we talked about, we may lose the ability to remove broken jobs from running queue items (a feature we literally just took advantage of 30 minutes ago ^)18:11
tobiashcorvus: we still have dequeue and abandon/restore18:12
corvustobiash: right, but so much of the reconfiguration code is explicitly designed to support this behavior18:13
tobiashcorvus: we could also retain removing jobs as a special case18:13
tobiashlike temporary refreeze and remove non existing jobs from the original buildset18:14
tobiashalthough this wouldn't give us the benefit of faster reconfigs18:15
corvusi'm concerned about inconsistent configurations -- we could have a change ahead that removes a job that produces an artifact, then a change behind that depends on that and it never arrives18:16
corvusi'm having trouble seeing how we can have consistent global configuration with updates, at the same time we have consistent buildset contents18:17
corvustobiash: i think the only way we can have both would be to discard already completed builds if the repo state has changed.18:19
corvus(which is not great for, say, a publishing pipeline)18:20
corvus(i mean, i guess it would be discard already completed builds if both the layout and the repo state has changed... something like that)18:21
tobiashdiscarding in a gate pipeline would be effectively a gate reset18:23
corvusyeah18:23
tobiashthat can mean throwing away two hours of build time in our more crowded gates18:24
corvusor, perhaps we weaken the buildset repo state guarantee so that we reset it for jobs that haven't started yet, but only if necessary (because a job config change)?18:24
corvusi just thought of another issue -- when we add new items to a gate pipeline, we use the layout from the item ahead if it's been speculatively updated; if we don't update that after a reconfig, then we could keep using an old layout perpetually as we keep enqueing items18:28
tobiashI think that could be updated withour refreezing18:29
tobiashso the already frozen jobs run as is and later items start with an updated layout?18:30
corvusyeah, if that's what we wanted to do, i suppose it's possible.  it would mean that we accomodate having frozen jobs that don't match their layout.18:31
corvusi don't love that idea; this is getting really complicated.18:31
tobiashso you'd do the refreezing combined with repo state refresh if the job config changed?18:35
corvusi think that's one option.  it supports most of our goals, except the buildset repo state in some circumstances18:36
tobiashI guess we basically have two choices, either ditch updates on buildsets (and thus don't remove jobs) or do the weakened repo state18:36
corvustrying to take some notes: https://etherpad.opendev.org/p/xtM3zRa7-xe5RkNejGOq18:37
tobiashwhat do you mean with configuration consistency?18:39
mordredcorvus, tobiash: "on configuration change" - I'm playing catch up - which type of config change do we mean here - like landing a non-speculative change?18:40
tobiashmordred: any tenant reconfig, e.g. removing/adding jobs from a pipeline18:41
mordrednod18:41
mordredthanks18:41
corvustobiash: two things: that a queue item's layout is based on the layout ahead of it (or the pipeline layout), and that a user can know what zuul is running by inspecting the repos and changes.18:41
corvustobiash: i'm not worried about reconfiguration time; i'm much more worried about correctness18:42
tobiashI think both options can be correct (if we don't care about real time job removal/addition)18:43
corvusi'm just saying i don't think it's a goal here and listing it isn't helping me evaluate the options18:44
tobiashjust wanted to mention that since reconfig times are still one of our biggest problems18:45
corvusunderstood18:46
corvus"configuration consistency needs updating layout without refreezing (can add complexity)" what does that mean?18:46
corvusoh, the thing about carrying around layouts on queue items without refreezing them18:46
corvusi think i got it18:46
tobiashyes18:46
tobiashfeel free to rephrase it :)18:47
tobiashsomehow that feels like a decision of real-time job updates vs repo state guarantee18:51
corvusi'd like to be really clear that real-time updates have been an explicit goal we have done a huge amount of work across many years to enable18:52
corvusit's not an accident.  and the fact that a user can actually know what zuul is running an any point in time by inspecting the repos and changes is important18:53
tobiashthen I think the best compromise is #118:53
corvusmaybe; but i haven't lost hope on #2 :)18:55
corvusi'm exploring what configuration consistency means18:55
corvuswhat are the ways changes can affect each other: provides/requires, hold-following-changes, semaphores?18:56
clarkbI haven't quite followed along, but wouldn't a git change to a parent changing jobs (to say remove or add one) imply a gate reset anyway?18:58
clarkbI'm trying to think of a situation where you wouldn't end up with a reset due to git changes18:58
tobiashit doesn't atm and looking at base job changes gate-resetting all gates is not viable18:59
clarkboh I see, it is dependencies outside of the gate dag18:59
tobiashyes18:59
corvusclarkb: nope, we could remove the tox-docs job from all repos in project-config, and as soon as that merges, poof it disappears from all running items without disruption.18:59
corvusright18:59
corvusbeing able to make those changes without resetting a day-long pipeline is really the driver for how reconfiguration works in zuul19:01
clarkbya I was missing that that was something that already happens TIL19:01
corvusand i agree, i don't think we want to add more gate resets :)19:01
tobiashcorvus: how important is the live-adding of new jobs compared to the live-removal?19:03
tobiashI think anwering this question could help us to judge between #1 and #319:03
corvustobiash: i am certain we have used both in the past in openstack.  it may be that it's more important when a project is growing and less important when things have stabilized.  it may also be more important with centralized config and less important with distributed config.19:04
corvusi suspect that my personal experience has drifted to the later part of both of those continuums, so if you asked me to weigh those right now, i would probably say that adding is not as important as removing, and that even removing may not be critical.19:06
corvusbut it's hard for me to say in general, because that's just based on my own experience; another zuul user might be going through a growth phase in a centrally controlled environment where both of those are really important19:07
corvusor maybe no one even understands that's how it works and they'd be surprised the buildset repo state could be inconsistent :)19:07
corvusam i being helpful? :)  don't answer that.19:08
tobiashour users were surprised that the repo state could be inconsistent at least ;)19:08
clarkbthinking out loud: could keep the set of jobs (and other related items) consistent with enqueue time unless a reset happens then update19:09
clarkbthat gives users an out (though not the easiest one to take advantage of)19:09
corvusclarkb: i think that's basically #2?19:09
clarkboh there is an etherpad /me opens19:10
tobiashyeah, that's #219:10
tobiashrelated to that I think we should write the repo state into a file in the job logs dir19:10
corvusthe reason i was asking about provides/requires and other things is to try to figure out how important configuration consistency is.  like, if we go with #2, is there ever a case where we could refreeze an item and have that adversely affect another item which was not refrozen?19:11
corvustobiash: i agree; i actually think the right way to reproduce a build.19:11
tobiashregarding provides/requires I'm not sure but I think that would be racy now already depending if the jobs have already been started or not?19:12
tobiashregarding semaphores I don't think there is an issue since they are locked right before job startup and unlocked after finish in any case19:13
corvustobiash: well, if you remove a provides job, and refreeze a requires job, then the refrozen job will no longer wait on the provides; i don't think there's a race there.  if the requires has started already, then that means the provides has finished, so no problem.19:14
corvusin a gate pipeline, if you removed a provides and it refroze, that would only happen if it reset and the provides would be behind it, and so it would get reset too and refreeze (this is still exploring option #2)19:15
corvusand in a check pipeline under #2, none of the items would ever refreeze, right?19:16
tobiashyes19:16
corvusregarding hold-following-changes, if you removed a hold-following-changes job, it would continue to hold following changes until the item that was holding was refrozen without the job.  likewise adding that feature.  i think that means if you added it to a gate pipeline, it may start showing up randomly.  like, each time an item in gate reset, it would get the hold-following-changes flag and start holding19:19
corvusthings behind it.  and then later if a change ahead of it reset, it would get the flag and start holding things behind it.19:19
corvusthat might be a little weird, but maybe that behavior is okay in that circumstance?19:20
tobiashI think that would be ok19:21
corvusregarding semaphores -- that could be a little more problematic, in that adding a semaphore to a job wouldn't take effect until all existing runs of that job had completed.  that could be a little problematic depending on exactly what the job was doing (but if you're only adding use of a contentious resource in the same change you're adding the semaphore, then the old versions of the job wouldn't be using19:23
corvusthe resource anyway).  anyway, potentially problematic depending on details.  removing a semaphore is probably not a big deal (you'd have extra locking for a little while until the old jobs finished)19:23
corvustobiash: i think those are the sort of things i had in mind with configuration consistency; but i think those are potentially acceptable behaviors in the case of #219:28
corvustobiash: i do have a slight concern that since #2 is a fundamental change to how reconfiguration works, there may be a large number of tests that need to be updated.19:36
*** jfoufas1 has quit IRC19:37
tobiashmaybe we could also go with 1 or 3 and evaluate 2 later in more depth19:38
corvustobiash, clarkb, mordred: how about we proceed thusly: 1) revert buildset repo state; 2) attempt to implement option #2 in a change; if it appears viable without a rewrite of zuul and/or its test suite, 3) we ask zuul-discuss if anyone objects to the behavior change (the things we noted above plus obviously the lack of real-time job add/remove).  if either the technical challenges are too complex or there19:39
corvusare people dependent on current behavior, we look at options 1 or 3?19:39
corvussorry i was almost done typing that when tobiash said his thing :)19:39
corvusi think i'm on board with tobiash's "neither of these is incorrect" idea.  i mostly don't want to burn time on option 1/3 if everyone really wants #2 :)19:41
tobiashthe plan sounds viable20:02
clarkbthat seems like a reasonable plan to me20:03
avasswhat actually happens with a requires right now if all buildsets providing the artifacts have already completed before the requires is enqueued? it just doesn't get any artifacts?20:08
corvusavass: they get them; zuul looks them up in the database20:08
avasscool20:09
corvusclarkb: want to +3 https://review.opendev.org/785427 ?20:10
avassit would be cool if provides/requires, files matchers and probably other things could be configurable in a project template somehow20:11
clarkbcorvus: ya let me just confirm the revert looks about right20:11
clarkbdone20:12
corvusavass: i don't see why they wouldn't be (or maybe i don't understand)20:12
corvus(almost everything you can do in a job you can do in a project, and anything you can do in a project you can do in a project-template)20:12
corvusclarkb: do you think you might have time to review https://review.opendev.org/783726 today?20:13
avasscorvus: can you modify a job in a project template to "require: X" or "provide: Y" when using it? like the provides/requires is not part of the template itself but the jobs are20:14
corvusavass: i believe so20:14
*** rlandy|rover is now known as rlandy|rover|afk20:14
clarkbcorvus: yes, I'm about to declare that gerrit account stuff will have to wait for tomorrow (I don't like doing big things like that in the afternoon and then disappearing for dinner, and doing those updates is not fast)20:15
corvusavass: i think that will even work as expected with the sql query :)20:16
corvusclarkb: cool, i think if we can merge the revert, that change, and then this trivial reno: https://review.opendev.org/785054 , then we can restart opendev tomorrow and release next week20:17
avassI suppose you just override the job in the project stanza, I wonder what would happen if the project template does that as well to set variables. like the project template has something like "jobs: [a{vars: {my_var: 1}}] and the project stanza does "jobs: [a{requires: [X]}]. would it still have the variable set in that case?20:19
*** dmsimard6 has joined #zuul20:24
*** tosky_ has joined #zuul20:25
*** dmsimard has quit IRC20:26
*** dmsimard6 is now known as dmsimard20:26
*** bschanzel_ has quit IRC20:26
*** tosky has quit IRC20:26
*** bschanzel has joined #zuul20:27
corvusavass: every level of the system adds to the inheritance path; so what really happens is that the job defined in the project inherits from the one in the project-template which inherits from the one in the job definition, etc...  so in your example, both things end up there.  and if you added two different 'provides' to both then it will end up providing both things.20:28
*** tosky_ is now known as tosky20:28
clarkbcorvus: one question on the zk race change20:31
avasscorvus: cool, I wasn't sure how it resolved that. :)20:33
corvusclarkb: repld20:40
clarkbcorvus: thanks I have approved the change20:42
corvusclarkb: cool, i also just left an addendum comment with additional info20:42
clarkbI've also approved the release note20:42
corvusthanks!20:45
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503920:56
*** rpittau is now known as rpittau|afk21:17
openstackgerritMerged zuul/zuul master: Revert "Make repo state buildset global"  https://review.opendev.org/c/zuul/zuul/+/78542721:23
*** sshnaidm|afk is now known as sshnaidm|off21:24
*** fsvsbs has joined #zuul21:36
fsvsbsHi qq, ihave the zuul_console streaming into /tmp/uuid.log but is not appearing in the zuul Web ui console am I missing something here that I need to configure the node pool node is static at the moment21:39
fsvsbsIt is late in the UK so hope I can catch up with you in the morning on this21:40
corvusfsvsbs: first thing to check is probably whether the log streaming port is open on the static node21:48
corvusfsvsbs: tcp 19885 is the port number21:49
openstackgerritMerged zuul/zuul master: Fix ZK-related race condition in github driver  https://review.opendev.org/c/zuul/zuul/+/78372621:54
*** y2kenny has joined #zuul21:57
openstackgerritMerged zuul/zuul master: Add a checkpoint release note  https://review.opendev.org/c/zuul/zuul/+/78505422:00
y2kennyI have been running trying to run a list of shell commands with the shell module using "with_items" with some of the items using sudo/become.  All the commands seem to run, but for some reason, for the sudo commands, I see "PermissionError: [Errno 13] Permission denied" and "ValueError: No start of json char found"22:00
y2kennyhttp://paste.openstack.org/show/oZQBVtt5UMdUE5COGlvQ/22:00
y2kennydoes any one know what could cause this?22:01
*** hamalq has quit IRC22:07
*** hamalq has joined #zuul22:07
corvusy2kenny: i haven't seen that, but i can tell you that /tmp/console-0a91b40d-bdbf-0d23-f891-0000000000a5-baremetal.log is part of the zuul log streaming which relies on custom ansible plugins.  also, the ValueError exception doesn't come from zuul.  and finally, i think there is no testing of the ansible async module with zuul.22:14
corvustobiash: i've done an initial triage of failing tests with option 2; it's about 13, and at first glance, i'd say about half of those we would probably just remove because they confirm the live reconfiguration behavior.  i'll do a more in-depth triage next.22:15
y2kennycorvus: I am suspecting it's an ansible thing but I just want to double check.22:16
corvustobiash: list in etherpad22:16
corvusy2kenny: very well could be, but there's definitely an intersection with zuul there, so also could be zuul, or zuul+ansible; hopefully those bits of info add context22:17
corvusy2kenny: if it's at all possible to try something similar without async, i'd give that a shot to narrow things down22:18
corvusy2kenny: an immediate thought occurs to me: what if the async task starts by executing a command with evelated privs, and the zuul console log gets created as rooot, and the later async calls that "check in" on the process don't run as root and don't have access.22:18
corvusno idea if that is a logically consistent theory -- just brainstorming :)22:19
y2kennycorvus: that's an interesting idea...22:19
y2kennyI will try to disable the async and see what happens22:20
y2kennycorvus: disabling async appears to eliminated the ValueError.  The log persist.22:42
y2kennythe errror related to the tmp/log I mean22:43
y2kennycorvus: I have an idea to test your theory... I will get back to you22:45
y2kennycorvus: you are probably right baout the stream log.  If I fixed the entire task to "become: True" instead of switching between true and force depending on the item (with_item), the stream log don't give error22:57
openstackgerritJames E. Blair proposed zuul/zuul master: Fix missing repo state restore  https://review.opendev.org/c/zuul/zuul/+/78531023:29
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Revert "Revert "Make repo state buildset global""  https://review.opendev.org/c/zuul/zuul/+/78553523:29
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Keep jobgraphs frozen across reconfiguration  https://review.opendev.org/c/zuul/zuul/+/78553623:29
corvusy2kenny: cool -- i mean, bummer about the error, but i'm glad we could make progress :)23:29
y2kennycorvus: so is zuul stream log the third party plugin or did you mean something else?23:30
y2kennycorvus: I am wondering where I can look to try to find the bug23:30
y2kennycorvus: I am wondering if this is just a matter of setting the right mode for the stream log file23:32
y2kenny(like a+rw)23:32
corvustobiash, clarkb, mordred: ^ i did a more in-depth triage of the tests which would be affected; i think the #2 approach is technically feasible, and i didn't see any major gotchas during my triage.  i think the main new thing is we need to address behavior when a project is removed from a tenant.  but i think that's a tractable problem.  i uploaded a change with my triage notes inline in the tests as23:33
corvuscomments.  so if you want to take a look at that and you agree, then i think next step is email to zuul-discuss to raise the behavior changes.23:33
corvusy2kenny: the zuul log stream is an ansible plugin that zuul automatically installs; it's in the zuul/ansible dir in the zuul repo.  i'm unsure if there is a reason the file is not created world-readable, or if it was just left to the default and it sometimes ends up that way due to a restricted umask.23:36
corvusy2kenny: to be specific, i think it's the custom command action module we're talking about23:36
y2kennyok23:37
*** tosky has quit IRC23:38
*** ajitha has joined #zuul23:44
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503923:50
*** Goneri has quit IRC23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!