*** Guest2352 is now known as prometheanfire | 00:34 | |
*** prometheanfire is now known as Guest2640 | 00:35 | |
*** Guest2640 is now known as prometheanfire | 00:36 | |
*** ykarel_ is now known as ykarel | 05:15 | |
*** rpittau|afk is now known as rpittau | 07:10 | |
*** ykarel is now known as ykarel|lunch | 08:24 | |
*** ykarel|lunch is now known as ykarel | 09:51 | |
*** jcapitao is now known as jcapitao_lunch | 10:29 | |
*** jcapitao_lunch is now known as jcapitao | 12:18 | |
*** rlandy is now known as rlandy|ruck | 12:19 | |
elodilles | hi, i'm planning to run the 'stale eol branch delete' script to remove another batch of eol'd branches (this time we have around a dozen branch to delete) | 14:50 |
---|---|---|
elodilles | as far as I see gerrit is fully operational so I guess it shouldn't be a problem to start now | 14:51 |
elodilles | clarkb fungi : fyi, as usual ^^^ | 14:51 |
clarkb | elodilles: yup should be fine. We're planning a downtime tomorrow around this time though | 14:52 |
elodilles | ack, thanks for the info :) | 14:53 |
fungi | agreed, go for it | 14:54 |
*** ykarel is now known as ykarel|away | 15:50 | |
elodilles | script finished without errors (list of deleted branches: https://paste.opendev.org/show/807797/ ) | 15:52 |
fungi | great! thanks for running that | 15:53 |
fungi | elodilles: do you expect these eols to reduce the volume of job failure reports to the stable-manit ml any time soon? we're up to something like 200 messages a day on that list and i can't imagine anyone is reading them all | 15:54 |
elodilles | well, when the most of the ocata branches were deleted then we went down from ~150 mails to ~120 mails, so now it's around that | 15:57 |
elodilles | to tell you the truth I don't think the number will be reduced that much | 15:58 |
elodilles | I mean, for now... | 15:58 |
fungi | i wonder if we should be using a different mechanism for tracking stable job failures. sending an e-mail message about each and every one when there are hundreds a day doesn't sound efficient | 15:58 |
fungi | zuul already has a dashboard we can query to see which jobs fail by branch, by project, in specific pipelines, et cetera | 15:59 |
elodilles | actually i don't know who else follows the mails beyond me o:) | 16:00 |
elodilles | is there a lot of people subscribed to stable-maint ? | 16:00 |
elodilles | * are | 16:00 |
fungi | i'll check, but just because they're subscribed doesn't mean their address is still valid or that they're looking at the messages | 16:01 |
fungi | mailman has 85 subscribers for the openstack-stable-maint ml, and a bunch of these i know to be people who are no longer active in openstack... i suspect that's the case for almost all of them in fact | 16:02 |
*** rpittau is now known as rpittau|afk | 16:03 | |
fungi | but at this point, the openstack-stable-maint list traffic accounts for roughly half of our overall legitimate volume of messages to mailing lists we're maintaining | 16:04 |
clarkb | Similarly if those jobs fail consistently and no one is workign to fix them should we stop running the jobs? | 16:06 |
fungi | it would definitely free up ci resources | 16:07 |
elodilles | yes, that's true. the main issue is that teams are not fixing their stable branch gate failures (and not even following their state) | 16:09 |
clarkb | I know for EM we said we'd just turn things off when that happened. I suspect this is happeing on the more supported stable branches too though? | 16:09 |
elodilles | clarkb: actually not | 16:10 |
elodilles | we don't really turn off failing tests | 16:10 |
elodilles | I mean where the maintenance is active mostly all jobs are still working | 16:10 |
elodilles | (nova, neutron, cinder, etc.) | 16:11 |
elodilles | however, for example lower-constraints jobs were deleted after some TC discussions, | 16:11 |
elodilles | when the new pip-resolver revealed version conflicts | 16:12 |
elodilles | and grenade were disabled for some of the projects | 16:12 |
elodilles | otherwise the less active projects have just broken gates, and that's it :/ | 16:13 |
elodilles | currently, most of the periodic-job failures are due to the easy_install issue in xenial based jobs | 16:14 |
fungi | so to reiterate, are those likely to ever be fixed? and if not, can we stop running them (at least for the projects where we expect they'll never be fixed)? | 16:16 |
elodilles | I have a TODO to fix it some general way (strangely it is not a problem for some repos, and for some other repos it was enough to install pbr in advance, but still it is on my TODO to have a general fix) | 16:17 |
fungi | the opendev sysadmins reserve the right to bypass zuul gating and gerrit acls in order to merge job configuration changes to projects where necessary, so if the problem is just a lack of reviewers on those projects approving changes we have ways around it | 16:18 |
elodilles | fungi: that could help | 16:18 |
fungi | though here we're talking specifically about official openstack projects, so would probably clear it with the openstack tx | 16:18 |
fungi | tc | 16:18 |
fungi | or at least give them a heads up when we do it | 16:18 |
elodilles | since some of the issues are due to inactive projects i think a heads up would be good | 16:20 |
fungi | right. in many cases it may be that the projects are active but nobody's maintaining their older stable branches and not paying attention to job failures on them, so just need to be encouraged to eol those branches | 16:29 |
clarkb | right I think the indication here may be that those branches are now dead and should be treated that way | 16:31 |
elodilles | yes, that's true. I'm thinking now to collect such repos and send a mail to teams about either fix or prepare to eol their broken branches | 16:32 |
elodilles | for inactive projects, I could try to propose patch like this: https://review.opendev.org/c/openstack/nova-powervm/+/802913 | 16:33 |
elodilles | though I guess these needs some extra rights as the gate won't allow these to merge (as the gate is broken) | 16:34 |
fungi | yeah, i or someone else on the tact sig who is also a gerrit sysadmin could bypass gating to merge that directly, with the acknowledgement of the tc | 16:35 |
elodilles | anyway, let me send some mails in the next days to teams about their broken stable gates | 16:36 |
fungi | though if that project can't merge any changes because its check jobs are also failing in master, then maybe we need to talk to the tc about retiring the project entirely | 16:36 |
fungi | or possibly retiring the entire team if all of their deliverables are in this state | 16:37 |
fungi | may also be worth the tc bringing this particular case up with the multi-arch sig | 16:37 |
elodilles | is there a process now that monitors the teams/projects health? | 16:39 |
fungi | elodilles: not really. the tc mostly responds to evidence like this to make determinations | 16:40 |
fungi | "health" is a really hard thing to define, much less measure, so it makes more sense to base decisions on whether teams are keeping up with general expectations for maintaining their deliverables (responding to vulnerability reports, participating in scheduled releases, keeping jobs running) | 16:41 |
elodilles | hmmm, I see. I remembered something like that existed before, but maybe it was long ago or my memory is tricking me :) | 16:41 |
fungi | there have been multiple attempts, but they required more work to track project health than it would have been to just take over those project duties | 16:42 |
elodilles | i see | 16:42 |
fungi | there are so many more projects than there are tc members, that assigning tc members to individually evaluate teams on an ongoing basis has always been untenanble | 16:44 |
fungi | untenable | 16:44 |
elodilles | that's understandable | 16:44 |
fungi | right now there are nearly 10 official project teams for every tc member | 16:44 |
elodilles | ok, so I think this is what I can do now: | 16:45 |
elodilles | since I'm checking the periodic-stable job failures anyway, | 16:45 |
elodilles | I will send mails to teams about their stable branch state (that are failing for a longer period), | 16:46 |
elodilles | and if we don't get replies then I can forward these cases to TC | 16:47 |
elodilles | what do you think? is this a good approach? | 16:48 |
clarkb | that seems reasonable to me | 16:51 |
clarkb | give people a cahnce to volunteer and help fix it, if that doesn't happen we can proceed to cleanup | 16:51 |
fungi | yep, make sense | 16:52 |
elodilles | ok, thanks, then I will start looking this tomorrow and then we will see | 16:54 |
elodilles | and hopefully this will soon eliminate the majority of the daily ~120 mails :) | 16:55 |
elodilles | and free up some CI resource, of course | 16:56 |
fungi | thanks! | 16:57 |
elodilles | no problem :) | 17:00 |
elodilles | and thanks too! | 17:01 |
*** sshnaidm is now known as sshnaidm|afk | 18:30 | |
opendevreview | James E. Blair proposed openstack/project-config master: Remove report-build-page from zuul tenant config https://review.opendev.org/c/openstack/project-config/+/802973 | 23:42 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!