Thursday, 2021-07-29

*** Guest2352 is now known as prometheanfire		00:34
*** prometheanfire is now known as Guest2640		00:35
*** Guest2640 is now known as prometheanfire		00:36
*** ykarel_ is now known as ykarel		05:15
*** rpittau\|afk is now known as rpittau		07:10
*** ykarel is now known as ykarel\|lunch		08:24
*** ykarel\|lunch is now known as ykarel		09:51
*** jcapitao is now known as jcapitao_lunch		10:29
*** jcapitao_lunch is now known as jcapitao		12:18
*** rlandy is now known as rlandy\|ruck		12:19
elodilles	hi, i'm planning to run the 'stale eol branch delete' script to remove another batch of eol'd branches (this time we have around a dozen branch to delete)	14:50
elodilles	as far as I see gerrit is fully operational so I guess it shouldn't be a problem to start now	14:51
elodilles	clarkb fungi : fyi, as usual ^^^	14:51
clarkb	elodilles: yup should be fine. We're planning a downtime tomorrow around this time though	14:52
elodilles	ack, thanks for the info :)	14:53
fungi	agreed, go for it	14:54
*** ykarel is now known as ykarel\|away		15:50
elodilles	script finished without errors (list of deleted branches: https://paste.opendev.org/show/807797/ )	15:52
fungi	great! thanks for running that	15:53
fungi	elodilles: do you expect these eols to reduce the volume of job failure reports to the stable-manit ml any time soon? we're up to something like 200 messages a day on that list and i can't imagine anyone is reading them all	15:54
elodilles	well, when the most of the ocata branches were deleted then we went down from ~150 mails to ~120 mails, so now it's around that	15:57
elodilles	to tell you the truth I don't think the number will be reduced that much	15:58
elodilles	I mean, for now...	15:58
fungi	i wonder if we should be using a different mechanism for tracking stable job failures. sending an e-mail message about each and every one when there are hundreds a day doesn't sound efficient	15:58
fungi	zuul already has a dashboard we can query to see which jobs fail by branch, by project, in specific pipelines, et cetera	15:59
elodilles	actually i don't know who else follows the mails beyond me o:)	16:00
elodilles	is there a lot of people subscribed to stable-maint ?	16:00
elodilles	* are	16:00
fungi	i'll check, but just because they're subscribed doesn't mean their address is still valid or that they're looking at the messages	16:01
fungi	mailman has 85 subscribers for the openstack-stable-maint ml, and a bunch of these i know to be people who are no longer active in openstack... i suspect that's the case for almost all of them in fact	16:02
*** rpittau is now known as rpittau\|afk		16:03
fungi	but at this point, the openstack-stable-maint list traffic accounts for roughly half of our overall legitimate volume of messages to mailing lists we're maintaining	16:04
clarkb	Similarly if those jobs fail consistently and no one is workign to fix them should we stop running the jobs?	16:06
fungi	it would definitely free up ci resources	16:07
elodilles	yes, that's true. the main issue is that teams are not fixing their stable branch gate failures (and not even following their state)	16:09
clarkb	I know for EM we said we'd just turn things off when that happened. I suspect this is happeing on the more supported stable branches too though?	16:09
elodilles	clarkb: actually not	16:10
elodilles	we don't really turn off failing tests	16:10
elodilles	I mean where the maintenance is active mostly all jobs are still working	16:10
elodilles	(nova, neutron, cinder, etc.)	16:11
elodilles	however, for example lower-constraints jobs were deleted after some TC discussions,	16:11
elodilles	when the new pip-resolver revealed version conflicts	16:12
elodilles	and grenade were disabled for some of the projects	16:12
elodilles	otherwise the less active projects have just broken gates, and that's it :/	16:13
elodilles	currently, most of the periodic-job failures are due to the easy_install issue in xenial based jobs	16:14
fungi	so to reiterate, are those likely to ever be fixed? and if not, can we stop running them (at least for the projects where we expect they'll never be fixed)?	16:16
elodilles	I have a TODO to fix it some general way (strangely it is not a problem for some repos, and for some other repos it was enough to install pbr in advance, but still it is on my TODO to have a general fix)	16:17
fungi	the opendev sysadmins reserve the right to bypass zuul gating and gerrit acls in order to merge job configuration changes to projects where necessary, so if the problem is just a lack of reviewers on those projects approving changes we have ways around it	16:18
elodilles	fungi: that could help	16:18
fungi	though here we're talking specifically about official openstack projects, so would probably clear it with the openstack tx	16:18
fungi	tc	16:18
fungi	or at least give them a heads up when we do it	16:18
elodilles	since some of the issues are due to inactive projects i think a heads up would be good	16:20
fungi	right. in many cases it may be that the projects are active but nobody's maintaining their older stable branches and not paying attention to job failures on them, so just need to be encouraged to eol those branches	16:29
clarkb	right I think the indication here may be that those branches are now dead and should be treated that way	16:31
elodilles	yes, that's true. I'm thinking now to collect such repos and send a mail to teams about either fix or prepare to eol their broken branches	16:32
elodilles	for inactive projects, I could try to propose patch like this: https://review.opendev.org/c/openstack/nova-powervm/+/802913	16:33
elodilles	though I guess these needs some extra rights as the gate won't allow these to merge (as the gate is broken)	16:34
fungi	yeah, i or someone else on the tact sig who is also a gerrit sysadmin could bypass gating to merge that directly, with the acknowledgement of the tc	16:35
elodilles	anyway, let me send some mails in the next days to teams about their broken stable gates	16:36
fungi	though if that project can't merge any changes because its check jobs are also failing in master, then maybe we need to talk to the tc about retiring the project entirely	16:36
fungi	or possibly retiring the entire team if all of their deliverables are in this state	16:37
fungi	may also be worth the tc bringing this particular case up with the multi-arch sig	16:37
elodilles	is there a process now that monitors the teams/projects health?	16:39
fungi	elodilles: not really. the tc mostly responds to evidence like this to make determinations	16:40
fungi	"health" is a really hard thing to define, much less measure, so it makes more sense to base decisions on whether teams are keeping up with general expectations for maintaining their deliverables (responding to vulnerability reports, participating in scheduled releases, keeping jobs running)	16:41
elodilles	hmmm, I see. I remembered something like that existed before, but maybe it was long ago or my memory is tricking me :)	16:41
fungi	there have been multiple attempts, but they required more work to track project health than it would have been to just take over those project duties	16:42
elodilles	i see	16:42
fungi	there are so many more projects than there are tc members, that assigning tc members to individually evaluate teams on an ongoing basis has always been untenanble	16:44
fungi	untenable	16:44
elodilles	that's understandable	16:44
fungi	right now there are nearly 10 official project teams for every tc member	16:44
elodilles	ok, so I think this is what I can do now:	16:45
elodilles	since I'm checking the periodic-stable job failures anyway,	16:45
elodilles	I will send mails to teams about their stable branch state (that are failing for a longer period),	16:46
elodilles	and if we don't get replies then I can forward these cases to TC	16:47
elodilles	what do you think? is this a good approach?	16:48
clarkb	that seems reasonable to me	16:51
clarkb	give people a cahnce to volunteer and help fix it, if that doesn't happen we can proceed to cleanup	16:51
fungi	yep, make sense	16:52
elodilles	ok, thanks, then I will start looking this tomorrow and then we will see	16:54
elodilles	and hopefully this will soon eliminate the majority of the daily ~120 mails :)	16:55
elodilles	and free up some CI resource, of course	16:56
fungi	thanks!	16:57
elodilles	no problem :)	17:00
elodilles	and thanks too!	17:01
*** sshnaidm is now known as sshnaidm\|afk		18:30
opendevreview	James E. Blair proposed openstack/project-config master: Remove report-build-page from zuul tenant config https://review.opendev.org/c/openstack/project-config/+/802973	23:42

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!