Friday, 2023-02-17

-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 872908: Cleanup old rebase-merge dirs on repo reset https://review.opendev.org/c/zuul/zuul/+/87290806:36
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 874096: Update reconfig event ltime on (smart) reconfig https://review.opendev.org/c/zuul/zuul/+/87409607:39
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 872908: Cleanup old rebase-merge dirs on repo reset https://review.opendev.org/c/zuul/zuul/+/87290809:17
-@gerrit:opendev.org- Benedikt Löffler proposed: [zuul/nodepool] 802255: Use optional upload script for uploading an image https://review.opendev.org/c/zuul/nodepool/+/80225509:52
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 873692: Return cached Github change on concurrent update https://review.opendev.org/c/zuul/zuul/+/87369210:14
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 873692: Return cached Github change on concurrent update https://review.opendev.org/c/zuul/zuul/+/87369210:15
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 873692: Return cached Github change on concurrent update https://review.opendev.org/c/zuul/zuul/+/87369210:17
@q:fricklercloud.decan we get a new tag for zuul soon? seems there are a lot of unreleased fixes like https://review.opendev.org/c/zuul/zuul/+/872519 which just hit us after upgrading to 8.1.011:51
@q:fricklercloud.dehow to other deployments handle this, do you all run from latest like #opendev? or do you build your own images?12:03
@q:fricklercloud.defwiw, trying with :latest containers gives me:12:19
`2023-02-17 12:15:40,351 ERROR zuul.WebServer: ValueError: ('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', [_OpenSSLErrorWithText(code=75497580, lib=9, reason=108, reason_text=b'error:0480006C:PEM routines::no start line')])`
@q:fricklercloud.dedisregard the last msg, that was a local deployment issue12:25
-@gerrit:opendev.org- Marvin Becker proposed: [zuul/nodepool] 873716: Add gpu support for k8s/openshift pods https://review.opendev.org/c/zuul/nodepool/+/87371613:37
@jim:acmegating.comi think next week (after the next opendev restart) would be a good time for a release assuming everything still looks good14:52
@jpew:matrix.orgAny chance https://review.opendev.org/c/zuul/zuul/+/873742 could make the next release?15:06
-@gerrit:opendev.org- Marvin Becker proposed: [zuul/nodepool] 873716: Add gpu support for k8s/openshift pods https://review.opendev.org/c/zuul/nodepool/+/87371616:05
@clarkb:matrix.orgjpew: corvus I left a +2 on 873742 but noted a corner case where I think we'll still do the wrong thing. It should be less frequent than with the prior patchset though hence my +2 if this gets most things moving along agan16:46
@jim:acmegating.comClark: jpew that's more of a new behavior than a regression bugfix -- i don't think we should rush it17:06
@jpew:matrix.orgI though Zuul never merged empty commits?17:06
@jpew:matrix.orgcorvus: Our deploy pipelines are broken :/17:07
@jim:acmegating.comjpew: yeah i get the urgency for you but they've always been broken17:07
@jim:acmegating.comthis has some pretty serious implicatons for version handling, especially with tags17:07
@jim:acmegating.comempty commits is actually one of the only ways you can get certain consistent behavior in zuul with tags17:08
@jim:acmegating.comso anyway, let's please don't rush this.  in the mean time you can run it with your patch locally applied17:08
@jpew:matrix.orgFair enough..... does it need to be a propery of the event then instead?17:10
@jim:acmegating.com(i wrote a bit more about the tag case on the change btw)17:10
@jpew:matrix.orgOr I guess if FETCH_HEAD is empty, we can not remove it17:12
@jim:acmegating.comjpew: re event property: i think that might be one option (and i think Clark suggested something like that as a possibility) -- but realistically, the data model in zuul requires this be a property of the Change(Ref) object, so if it originates at the event, it needs to end up in the Change somehow.   so some kind of a merged flag might help.  re fetch_head: that sounds promising too...17:17
@jpew:matrix.orgThe FETCH_HEAD is pretty simple, I'll work it up quick after lunch17:17
@clarkb:matrix.orgre not merging empty commits you can still push a merge commit and land that and have it work out properly I think. Its just the git handles the merge situation for us and we don't have to think about it much17:18
@jim:acmegating.comClark: aiui the change from jpew would unwind that even in a check pipeline; that makes me uncomfortable17:19
@jim:acmegating.comto try to clarify: 1) empty commits are vital for some operations; 2) we don't want to alter the behavior of empty commits in check/gate17:20
@jpew:matrix.orgClark: Cherry-pick already has a special case for merges17:20
@jim:acmegating.comClark: oh i see i misunderstood what you wrote17:21
@jpew:matrix.org(which this doesn't change)17:21
@clarkb:matrix.org> <@jim:acmegating.com> Clark: aiui the change from jpew would unwind that even in a check pipeline; that makes me uncomfortable17:21
Yes Ithink that is correct. But it would only affect the case where someone pushes and explicit empty commit. I have no idea if anyone does that, but being cautious seems fine.
@clarkb:matrix.orgits definitely a corner case. But one that is possible17:21
@jim:acmegating.combut yeah, i still don't think we want to require someone make a merge in order to cherry-pick an empty commit17:21
@jim:acmegating.comi'm saying it's not a corner case17:21
@jim:acmegating.comi'm saying it's absolutely 100% critical to certain workflows17:21
@jpew:matrix.orgYou push direct to git, or push an empty commit to gerrit17:21
@clarkb:matrix.orgok I wasn't aware of anyone using empty commits. But if they are then yes this would be problematic17:22
@jim:acmegating.comyep i wrote about it a bit on the change -- it's the only way to get consistent behavior from tag-based pipelines on projects with multiple branches which merge back into master.17:22
@jpew:matrix.orgYa, makes sense17:23
@jim:acmegating.com(even if it weren't an important existing use case, we should always be careful/skeptical and try to minimize the difference between what zuul does and what the code review system will do)17:24
@jim:acmegating.com(and to be clear, i think jpew is also trying to achieve that in the change being worked on -- it's a fine line we're going to have to walk here :)17:25
@jpew:matrix.orgcorvus: Although obsensibly, cherry-pick mode with an empty commit is already broken today (because it doesn't pass --allow-empty) :)17:29
@jpew:matrix.orgBut, it makes sense to fix it at the same time I suppose17:30
@jim:acmegating.comyeah, now that we're thinking about it, we should be thorough :)17:34
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] 872364: Switch our local testing docker-compose to mysql 8.0 https://review.opendev.org/c/zuul/zuul/+/87236417:55
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 873470: Match events to pipelines based on topic deps https://review.opendev.org/c/zuul/zuul/+/87347018:12
-@gerrit:opendev.org- Joshua Watt proposed: [zuul/zuul] 873742: merger: Keep redundant cherry-pick commits https://review.opendev.org/c/zuul/zuul/+/87374219:22
@jpew:matrix.orgOk. Fixed I think. Also added a test case to make sure that actually empty commits are preserv3ed19:22
@jpew:matrix.org * Ok. Fixed I think. Also added a test case to make sure that actually empty commits are preserved19:22
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 872368: Allow default-ansible-version to be an int https://review.opendev.org/c/zuul/zuul/+/87236819:30
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 874096: Update reconfig event ltime on (smart) reconfig https://review.opendev.org/c/zuul/zuul/+/87409619:30
@goneri:matrix.orgHi, we often face a situation where the job fails before the logs are uploaded. To simplify the troubleshooting, I started an app that actually collect all the logs as soon as the jobs start. https://github.com/goneri/the_zuul_watcher20:55
@fungicide:matrix.orgnormal job failures shouldn't interfere with log uploads. sometimes there are failures uploading logs though. or nodes crashing and becoming unreachable before logs can be collected from them. one of the things we talked about was having the zuul info upload first in a separate task. it might also be possible for executors to save and stream console logs and json locally and upload before collecting log files from job nodes, worth looking into21:00
@jim:acmegating.comfungi: even the failure to fetch logs should not interfere with log uploads in a properly constructed job.  the base job should have log uploads in the last post-run playbook, and it should have fetching logs in the second-to-last post-run playbook.  opendev's base jobs are constructed this way, and i recommend that as a pattern to follow.21:14
@goneri:matrix.orgWell, for instance we recently had a situation where Zuul was able to connect to some of the nodes https://paste.centos.org/view/35afb08b, it was a misconfiguration of a provide. The application above was useful to troubleshot the problem.21:17
@jim:acmegating.comGonéri 🇺🇦: that should have still appeared in the job-output.txt and been uploaded in the final post-run playbook assuming the pattern i described above21:19
@jim:acmegating.comGonéri 🇺🇦: but your program would certainly be useful in the case that it couldn't reach the log server for upload.  those incidents should be recorded in the executor log, but this would make it easier for a non-admin to troubleshoot21:20
@jim:acmegating.comGonéri 🇺🇦: thanks for sharing -- i do see your program as useful.  :)  i also know that some people misconfigure zuul (for example, putting log uploads in the run playbook, or combining them in a playbook with other roles/tasks, or putting them in the cleanup playbook) and in those cases, they might see more "log failures" than would normally be expected.  so i want to highlight that in case other factors are at play.21:22
@goneri:matrix.orgI don't think the base job did even start in my case. The job was failing before.21:24
@jim:acmegating.comthat might arguably be a bug in handling job setup failures then21:25
@jim:acmegating.com(i say arguably because if we can't setup, the right choice might be to do nothing)21:27
@jim:acmegating.comGonéri 🇺🇦: anyway, thanks for sharing the problem description and your solution :)21:28
@clarkb:matrix.orgone use case that could be useful for is when jobs crash say with nested virt21:31
@clarkb:matrix.orgbeing able to create a record outside of the browser easily would be nice for those situations21:32
@jim:acmegating.comClark: how so?  why wouldn't that be in the normal job-output.txt?21:32
@clarkb:matrix.orgcorvus: I seem to recall it wasn't and the last time we had this going on users were asked to keep straems open to get the data21:35
@clarkb:matrix.orgI don't know why that was the case though, but definitely recall people needing to open the various console streams for the jobs theywere interested in21:35
@jim:acmegating.comClark: that seems weird to me because the web stream is just a stream of the job-output.txt file, so the only reason not to have that at the end of the job is that uploading failed for some reason.  shouldn't have anything to do with the remote node after the job starts (Gonéri 🇺🇦  found a case where it's a problem before the job starts)21:36
@jim:acmegating.comClark: even a timeout shouldn't affect that, as we give each post-run playbook its own timeout21:38
@clarkb:matrix.orgya I may have conflated issues too. I remember people doing that for some reason though21:38
@jim:acmegating.comwell, it can sure be handy when things go really wrong :)21:39
@clarkb:matrix.orgmaybe it was when tripleo had post-run timeouts which might have broken uploads or at least recording of where to find them21:39
@jim:acmegating.commaybe that was before the split into 2 post-run playbooks?21:39
@jim:acmegating.comor maybe was timing out the upload trying to upload way too much data?21:40
@clarkb:matrix.orgya in tripleo's case it was processing the logs before uploading taking tons of time iirc. Whihc would be before splitting things I guess21:41
@jim:acmegating.comyeah, the split into 2 playbooks solves a lot of probs21:41
@jim:acmegating.comstill, maybe ensuring that the upload does the job-output.txt first might not be a bad idea21:41
@jim:acmegating.com(and maybe in doing so, it could set the return value early too)21:42
@jim:acmegating.combut that gets complex for the synthetic index generation phase21:42
@clarkb:matrix.orgthere were a couple things when fungi and I started looking at that. One was the indexes iirc. I forget the other21:42
@clarkb:matrix.orgits doable but needs effort21:42
@jim:acmegating.comso maybe that needs to be part of the upload routine in ansible python itself21:42
@fungicide:matrix.orgoh. yes i forgot we'd already merged the change to upload the basic logs first in a separate playbook21:45
@goneri:matrix.orgWe also had a case where the gather_facts was timeouting. It was the same, without the Websocket opened at the right time, it was hard to troubleshoot the problem.21:48
-@gerrit:opendev.org- Joshua Watt proposed: [zuul/zuul] 873742: merger: Keep redundant cherry-pick commits https://review.opendev.org/c/zuul/zuul/+/87374222:47
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 873692: Return cached Github change on concurrent update https://review.opendev.org/c/zuul/zuul/+/87369222:55
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 872908: Cleanup old rebase-merge dirs on repo reset https://review.opendev.org/c/zuul/zuul/+/87290823:07

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!