Friday, 2023-02-17

-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 872908: Cleanup old rebase-merge dirs on repo reset https://review.opendev.org/c/zuul/zuul/+/872908		06:36
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 874096: Update reconfig event ltime on (smart) reconfig https://review.opendev.org/c/zuul/zuul/+/874096		07:39
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 872908: Cleanup old rebase-merge dirs on repo reset https://review.opendev.org/c/zuul/zuul/+/872908		09:17
-@gerrit:opendev.org- Benedikt Löffler proposed: [zuul/nodepool] 802255: Use optional upload script for uploading an image https://review.opendev.org/c/zuul/nodepool/+/802255		09:52
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 873692: Return cached Github change on concurrent update https://review.opendev.org/c/zuul/zuul/+/873692		10:14
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 873692: Return cached Github change on concurrent update https://review.opendev.org/c/zuul/zuul/+/873692		10:15
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 873692: Return cached Github change on concurrent update https://review.opendev.org/c/zuul/zuul/+/873692		10:17
@q:fricklercloud.de	can we get a new tag for zuul soon? seems there are a lot of unreleased fixes like https://review.opendev.org/c/zuul/zuul/+/872519 which just hit us after upgrading to 8.1.0	11:51
@q:fricklercloud.de	how to other deployments handle this, do you all run from latest like #opendev? or do you build your own images?	12:03
@q:fricklercloud.de	fwiw, trying with :latest containers gives me:	12:19
`2023-02-17 12:15:40,351 ERROR zuul.WebServer: ValueError: ('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', [_OpenSSLErrorWithText(code=75497580, lib=9, reason=108, reason_text=b'error:0480006C:PEM routines::no start line')])`
@q:fricklercloud.de	disregard the last msg, that was a local deployment issue	12:25
-@gerrit:opendev.org- Marvin Becker proposed: [zuul/nodepool] 873716: Add gpu support for k8s/openshift pods https://review.opendev.org/c/zuul/nodepool/+/873716		13:37
@jim:acmegating.com	i think next week (after the next opendev restart) would be a good time for a release assuming everything still looks good	14:52
@jpew:matrix.org	Any chance https://review.opendev.org/c/zuul/zuul/+/873742 could make the next release?	15:06
-@gerrit:opendev.org- Marvin Becker proposed: [zuul/nodepool] 873716: Add gpu support for k8s/openshift pods https://review.opendev.org/c/zuul/nodepool/+/873716		16:05
@clarkb:matrix.org	jpew: corvus I left a +2 on 873742 but noted a corner case where I think we'll still do the wrong thing. It should be less frequent than with the prior patchset though hence my +2 if this gets most things moving along agan	16:46
@jim:acmegating.com	Clark: jpew that's more of a new behavior than a regression bugfix -- i don't think we should rush it	17:06
@jpew:matrix.org	I though Zuul never merged empty commits?	17:06
@jpew:matrix.org	corvus: Our deploy pipelines are broken :/	17:07
@jim:acmegating.com	jpew: yeah i get the urgency for you but they've always been broken	17:07
@jim:acmegating.com	this has some pretty serious implicatons for version handling, especially with tags	17:07
@jim:acmegating.com	empty commits is actually one of the only ways you can get certain consistent behavior in zuul with tags	17:08
@jim:acmegating.com	so anyway, let's please don't rush this. in the mean time you can run it with your patch locally applied	17:08
@jpew:matrix.org	Fair enough..... does it need to be a propery of the event then instead?	17:10
@jim:acmegating.com	(i wrote a bit more about the tag case on the change btw)	17:10
@jpew:matrix.org	Or I guess if FETCH_HEAD is empty, we can not remove it	17:12
@jim:acmegating.com	jpew: re event property: i think that might be one option (and i think Clark suggested something like that as a possibility) -- but realistically, the data model in zuul requires this be a property of the Change(Ref) object, so if it originates at the event, it needs to end up in the Change somehow. so some kind of a merged flag might help. re fetch_head: that sounds promising too...	17:17
@jpew:matrix.org	The FETCH_HEAD is pretty simple, I'll work it up quick after lunch	17:17
@clarkb:matrix.org	re not merging empty commits you can still push a merge commit and land that and have it work out properly I think. Its just the git handles the merge situation for us and we don't have to think about it much	17:18
@jim:acmegating.com	Clark: aiui the change from jpew would unwind that even in a check pipeline; that makes me uncomfortable	17:19
@jim:acmegating.com	to try to clarify: 1) empty commits are vital for some operations; 2) we don't want to alter the behavior of empty commits in check/gate	17:20
@jpew:matrix.org	Clark: Cherry-pick already has a special case for merges	17:20
@jim:acmegating.com	Clark: oh i see i misunderstood what you wrote	17:21
@jpew:matrix.org	(which this doesn't change)	17:21
@clarkb:matrix.org	> <@jim:acmegating.com> Clark: aiui the change from jpew would unwind that even in a check pipeline; that makes me uncomfortable	17:21
Yes Ithink that is correct. But it would only affect the case where someone pushes and explicit empty commit. I have no idea if anyone does that, but being cautious seems fine.
@clarkb:matrix.org	its definitely a corner case. But one that is possible	17:21
@jim:acmegating.com	but yeah, i still don't think we want to require someone make a merge in order to cherry-pick an empty commit	17:21
@jim:acmegating.com	i'm saying it's not a corner case	17:21
@jim:acmegating.com	i'm saying it's absolutely 100% critical to certain workflows	17:21
@jpew:matrix.org	You push direct to git, or push an empty commit to gerrit	17:21
@clarkb:matrix.org	ok I wasn't aware of anyone using empty commits. But if they are then yes this would be problematic	17:22
@jim:acmegating.com	yep i wrote about it a bit on the change -- it's the only way to get consistent behavior from tag-based pipelines on projects with multiple branches which merge back into master.	17:22
@jpew:matrix.org	Ya, makes sense	17:23
@jim:acmegating.com	(even if it weren't an important existing use case, we should always be careful/skeptical and try to minimize the difference between what zuul does and what the code review system will do)	17:24
@jim:acmegating.com	(and to be clear, i think jpew is also trying to achieve that in the change being worked on -- it's a fine line we're going to have to walk here :)	17:25
@jpew:matrix.org	corvus: Although obsensibly, cherry-pick mode with an empty commit is already broken today (because it doesn't pass --allow-empty) :)	17:29
@jpew:matrix.org	But, it makes sense to fix it at the same time I suppose	17:30
@jim:acmegating.com	yeah, now that we're thinking about it, we should be thorough :)	17:34
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] 872364: Switch our local testing docker-compose to mysql 8.0 https://review.opendev.org/c/zuul/zuul/+/872364		17:55
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 873470: Match events to pipelines based on topic deps https://review.opendev.org/c/zuul/zuul/+/873470		18:12
-@gerrit:opendev.org- Joshua Watt proposed: [zuul/zuul] 873742: merger: Keep redundant cherry-pick commits https://review.opendev.org/c/zuul/zuul/+/873742		19:22
@jpew:matrix.org	Ok. Fixed I think. Also added a test case to make sure that actually empty commits are preserv3ed	19:22
@jpew:matrix.org	* Ok. Fixed I think. Also added a test case to make sure that actually empty commits are preserved	19:22
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 872368: Allow default-ansible-version to be an int https://review.opendev.org/c/zuul/zuul/+/872368		19:30
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 874096: Update reconfig event ltime on (smart) reconfig https://review.opendev.org/c/zuul/zuul/+/874096		19:30
@goneri:matrix.org	Hi, we often face a situation where the job fails before the logs are uploaded. To simplify the troubleshooting, I started an app that actually collect all the logs as soon as the jobs start. https://github.com/goneri/the_zuul_watcher	20:55
@fungicide:matrix.org	normal job failures shouldn't interfere with log uploads. sometimes there are failures uploading logs though. or nodes crashing and becoming unreachable before logs can be collected from them. one of the things we talked about was having the zuul info upload first in a separate task. it might also be possible for executors to save and stream console logs and json locally and upload before collecting log files from job nodes, worth looking into	21:00
@jim:acmegating.com	fungi: even the failure to fetch logs should not interfere with log uploads in a properly constructed job. the base job should have log uploads in the last post-run playbook, and it should have fetching logs in the second-to-last post-run playbook. opendev's base jobs are constructed this way, and i recommend that as a pattern to follow.	21:14
@goneri:matrix.org	Well, for instance we recently had a situation where Zuul was able to connect to some of the nodes https://paste.centos.org/view/35afb08b, it was a misconfiguration of a provide. The application above was useful to troubleshot the problem.	21:17
@jim:acmegating.com	Gonéri 🇺🇦: that should have still appeared in the job-output.txt and been uploaded in the final post-run playbook assuming the pattern i described above	21:19
@jim:acmegating.com	Gonéri 🇺🇦: but your program would certainly be useful in the case that it couldn't reach the log server for upload. those incidents should be recorded in the executor log, but this would make it easier for a non-admin to troubleshoot	21:20
@jim:acmegating.com	Gonéri 🇺🇦: thanks for sharing -- i do see your program as useful. :) i also know that some people misconfigure zuul (for example, putting log uploads in the run playbook, or combining them in a playbook with other roles/tasks, or putting them in the cleanup playbook) and in those cases, they might see more "log failures" than would normally be expected. so i want to highlight that in case other factors are at play.	21:22
@goneri:matrix.org	I don't think the base job did even start in my case. The job was failing before.	21:24
@jim:acmegating.com	that might arguably be a bug in handling job setup failures then	21:25
@jim:acmegating.com	(i say arguably because if we can't setup, the right choice might be to do nothing)	21:27
@jim:acmegating.com	Gonéri 🇺🇦: anyway, thanks for sharing the problem description and your solution :)	21:28
@clarkb:matrix.org	one use case that could be useful for is when jobs crash say with nested virt	21:31
@clarkb:matrix.org	being able to create a record outside of the browser easily would be nice for those situations	21:32
@jim:acmegating.com	Clark: how so? why wouldn't that be in the normal job-output.txt?	21:32
@clarkb:matrix.org	corvus: I seem to recall it wasn't and the last time we had this going on users were asked to keep straems open to get the data	21:35
@clarkb:matrix.org	I don't know why that was the case though, but definitely recall people needing to open the various console streams for the jobs theywere interested in	21:35
@jim:acmegating.com	Clark: that seems weird to me because the web stream is just a stream of the job-output.txt file, so the only reason not to have that at the end of the job is that uploading failed for some reason. shouldn't have anything to do with the remote node after the job starts (Gonéri 🇺🇦 found a case where it's a problem before the job starts)	21:36
@jim:acmegating.com	Clark: even a timeout shouldn't affect that, as we give each post-run playbook its own timeout	21:38
@clarkb:matrix.org	ya I may have conflated issues too. I remember people doing that for some reason though	21:38
@jim:acmegating.com	well, it can sure be handy when things go really wrong :)	21:39
@clarkb:matrix.org	maybe it was when tripleo had post-run timeouts which might have broken uploads or at least recording of where to find them	21:39
@jim:acmegating.com	maybe that was before the split into 2 post-run playbooks?	21:39
@jim:acmegating.com	or maybe was timing out the upload trying to upload way too much data?	21:40
@clarkb:matrix.org	ya in tripleo's case it was processing the logs before uploading taking tons of time iirc. Whihc would be before splitting things I guess	21:41
@jim:acmegating.com	yeah, the split into 2 playbooks solves a lot of probs	21:41
@jim:acmegating.com	still, maybe ensuring that the upload does the job-output.txt first might not be a bad idea	21:41
@jim:acmegating.com	(and maybe in doing so, it could set the return value early too)	21:42
@jim:acmegating.com	but that gets complex for the synthetic index generation phase	21:42
@clarkb:matrix.org	there were a couple things when fungi and I started looking at that. One was the indexes iirc. I forget the other	21:42
@clarkb:matrix.org	its doable but needs effort	21:42
@jim:acmegating.com	so maybe that needs to be part of the upload routine in ansible python itself	21:42
@fungicide:matrix.org	oh. yes i forgot we'd already merged the change to upload the basic logs first in a separate playbook	21:45
@goneri:matrix.org	We also had a case where the gather_facts was timeouting. It was the same, without the Websocket opened at the right time, it was hard to troubleshoot the problem.	21:48
-@gerrit:opendev.org- Joshua Watt proposed: [zuul/zuul] 873742: merger: Keep redundant cherry-pick commits https://review.opendev.org/c/zuul/zuul/+/873742		22:47
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 873692: Return cached Github change on concurrent update https://review.opendev.org/c/zuul/zuul/+/873692		22:55
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 872908: Cleanup old rebase-merge dirs on repo reset https://review.opendev.org/c/zuul/zuul/+/872908		23:07

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!