opendevreview | Ghanshyam proposed openstack/governance master: Add link to Yoga announcement https://review.opendev.org/c/openstack/governance/+/799926 | 00:15 |
---|---|---|
opendevreview | Ghanshyam proposed openstack/governance master: Define Yoga release testing runtime https://review.opendev.org/c/openstack/governance/+/799927 | 00:22 |
opendevreview | Merged openstack/governance master: Add DPL model also in 'Appointing leaders' section https://review.opendev.org/c/openstack/governance/+/797985 | 00:38 |
*** rpittau|afk is now known as rpittau | 06:59 | |
*** slaweq_ is now known as slaweq | 11:42 | |
* jungleboyj sighs .... Yoga ... didn't think it would go that far down the list. | 12:13 | |
*** rpittau is now known as rpittau|afk | 12:45 | |
fungi | i was holding out hope for yog-sothoth | 12:48 |
* jungleboyj laughs | 12:50 | |
jungleboyj | I am surprised that Yoga went through as it is one of the laptop lines we have at Lenovo. | 12:50 |
tosky | I know it's late, but maybe using https://spaceballs.fandom.com/wiki/Yogurt instead of Yoghurt may have changed the final result | 12:51 |
jungleboyj | :-) Oh Spaceballs. | 12:52 |
gmann | jungleboyj: I had Yoga laptop but it stopped working after an year or so:) | 12:54 |
gmann | but I think we should find some way to do some pre-sanity trademark check before vote | 12:54 |
jungleboyj | gmann: :-( That is no good. I have had several Yogas and they all have worked well except for one that it took them a while to figure out it had a bad battery. | 12:55 |
jungleboyj | gmann: Which model? | 12:55 |
gmann | Yoga 910 | 12:55 |
jungleboyj | gmann: Oh, that is a nice one. I think that is the one my son has. Surprised it didn't last longer. | 12:56 |
gmann | it seems motherbaord issue but I need to send that to service center. may be motherbaord is same costly as new laptop as it is not in warranty | 12:56 |
jungleboyj | I always get the 3 year warranty though. That has been my luck with Laptops. :-) | 13:06 |
*** poojajadhav is now known as pojadhav | 13:16 | |
gmann | tc-members: meeting time. | 15:00 |
gmann | #startmeeting tc | 15:00 |
opendevmeet | Meeting started Thu Jul 8 15:00:17 2021 UTC and is due to finish in 60 minutes. The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
opendevmeet | The meeting name has been set to 'tc' | 15:00 |
gmann | #topic Roll call | 15:00 |
diablo_rojo_phone | o/ | 15:00 |
gmann | o/ | 15:00 |
jungleboyj | o/ | 15:00 |
ricolin | o/ | 15:00 |
belmoreira | o/ | 15:00 |
dansmith | o/ | 15:00 |
clarkb | hello | 15:00 |
gmann | clarkb: hi | 15:00 |
gmann | yoctozepto is on PTO so would not be able to join today meeting | 15:00 |
dansmith | did we approve that time off? | 15:01 |
gmann | :) | 15:01 |
fungi | i told him it was okay | 15:01 |
jungleboyj | :-) | 15:01 |
gmann | let's start | 15:01 |
gmann | #topic Follow up on past action items | 15:01 |
gmann | gmann to remove Governance non-active repos cleanup topic from agenda | 15:02 |
gmann | done | 15:02 |
gmann | gmann to remove election assignments topic form agenda | 15:02 |
gmann | this too | 15:02 |
gmann | ricolin to ask for collecting the ops pain points on openstack-discuss ML | 15:02 |
gmann | ricolin: any update on this | 15:02 |
ricolin | already added it on community-goals backlogs and y-cycle pre-selected,but not yet send ML out | 15:03 |
*** frickler is now known as frickler_pto | 15:03 | |
gmann | +1. i think that is good | 15:03 |
spotz | o/ | 15:03 |
ricolin | will send it out this week | 15:03 |
ricolin | on ML | 15:03 |
gmann | ok, thanks | 15:03 |
gmann | gmann to propose the RBAC goal | 15:04 |
gmann | I proposed that #link https://review.opendev.org/c/openstack/governance/+/799705 | 15:04 |
gmann | please review | 15:04 |
gmann | #topic Gate health check (dansmith/yoctozepto) | 15:04 |
gmann | dansmith: any news | 15:05 |
dansmith | I really have nothing to report, but mostly because I've been too busy with other stuff to be submitting many patches in the last week or so | 15:05 |
gmann | ok | 15:05 |
gmann | one thing to share is about the log warning especialyl from oslo policy | 15:05 |
fungi | we've had a bit of job configuration upheaval from the zuul 4.6.0 security release | 15:05 |
gmann | melwitt clarkb: pointed out that in infra channel and many projects have such ot of warning due to policy rule | 15:06 |
gmann | I am fixing those in #link https://review.opendev.org/q/topic:%22fix-oslo-policy-warnings%22+(status:open%20OR%20status:merged) | 15:06 |
fungi | had to make non-backward-compatible changes to how some kinds of variables are accessed, particularly with regards to secrets, so that's been disrupting some post/promote jobs (should be under control now), as well as made some projects' over all zuul configuration insta-buggy causing some of their jobs to not run | 15:06 |
fungi | i think kolla was hardest hit by that | 15:07 |
gmann | ok | 15:07 |
gmann | fungi: any effected project without ack or need helpon this ? | 15:08 |
gmann | I saw on ML about few project ack that and working on | 15:08 |
clarkb | gmann: it might be good to update those warnings to only fire once per process | 15:08 |
fungi | i haven't checked in the past few days, but click the bell icon at the top-right of the zuul status page for a list of some which may need help | 15:08 |
clarkb | I can't imagine those warnings helps operators any more than they help CI | 15:08 |
gmann | fungi: ok, thanks for update. let us know if any project did not notice or need help | 15:09 |
gmann | back to policy rule warning | 15:10 |
gmann | clarkb: yes, that seems very noisy now | 15:10 |
gmann | when we added it initially we thought it would help operator to move to new rbac but as in new rbac work every policy rule changed rhe default so warning | 15:11 |
gmann | whihc seems does not help much | 15:11 |
gmann | One approch I sent on ML about disableing those by default with make it configurable so that operator can enable those to see what all they need to update | 15:11 |
gmann | #link http://lists.openstack.org/pipermail/openstack-discuss/2021-July/023484.html | 15:11 |
gmann | and this is patch #link https://review.opendev.org/c/openstack/oslo.policy/+/799539 | 15:12 |
gmann | feel free to respond to ML or gerrit about your opinon | 15:12 |
gmann | anything else to discuss related to gate health? | 15:13 |
gmann | #topic Migration from 'Freenode' to 'OFTC' (gmann) | 15:14 |
gmann | #link https://etherpad.opendev.org/p/openstack-irc-migration-to-oftc | 15:14 |
gmann | I started pushing the patches for remaining projects #link https://review.opendev.org/q/topic:%22oftc%22+(status:open%20OR%20status:merged) | 15:14 |
gmann | few are still left | 15:14 |
gmann | nothing else to share on this | 15:15 |
fungi | today we landed an update to the opendev infra manual as well, so if you refer anyone there it should now properly reference oftc and not freenode | 15:15 |
gmann | +1 | 15:15 |
gmann | #topic Xena Tracker | 15:16 |
spotz | +1 | 15:16 |
gmann | #link https://etherpad.opendev.org/p/tc-xena-tracker | 15:16 |
gmann | I think we can close 'election promotion' now as we have three new election official | 15:17 |
gmann | spotz: belmoreira diablo_rojo_phone ? what you say? | 15:17 |
gmann | L63 in etherpad | 15:17 |
fungi | i'm very excited by that, and happy to answer questions anyone has | 15:17 |
jungleboyj | \o/ | 15:17 |
spotz | Yeah and we now have a name for that patch | 15:17 |
gmann | and email opt in process or solution can be discussed by yoou guys at election channel | 15:18 |
belmoreira | lgtm | 15:18 |
gmann | thanks again for volunteering | 15:18 |
gmann | Charter revision also done so marked as completed | 15:19 |
diablo_rojo_phone | Yes we can close it. | 15:19 |
gmann | any other update on Xena tracker? | 15:20 |
gmann | jungleboyj: mnaser any update you want to share for 'stable policy process change' ? | 15:20 |
jungleboyj | No, didn't get to that with the holiday week. | 15:21 |
gmann | ok | 15:21 |
gmann | we have 8 items in etherpad to finish in Xena, let's start working on those which should not take much time | 15:22 |
gmann | moving next.. | 15:22 |
gmann | #topic ELK services plan and help status | 15:22 |
gmann | first is Board meeting updates | 15:23 |
gmann | I presented this slide in 30th June Board meeting #link https://docs.google.com/presentation/u/1/d/1ugdwMI2ZM2L8z1sobzHJwDpbvlyWKH02PH7Fi4tkyVc/edit#slide=id.ge1bdf71dac_0_0 | 15:23 |
gmann | I was expected some actional item from Board but that did not happen. | 15:23 |
gmann | Board ack this help-needed and stated to broadcast it in the organization/local community etc | 15:24 |
gmann | that I think we everyone are doing since 2018 when we re-defined the upstream investment opportunity | 15:25 |
gmann | honestly saying I am not so happy with the no-actionable item from that meeting | 15:26 |
gmann | and do not know how we can get help here ? | 15:26 |
spotz | I took it as folks were going back to their own companies | 15:26 |
spotz | It was a bit late for me though | 15:26 |
gmann | yeah theor own company also | 15:26 |
gmann | butn that is no different step from what we all including Board are trying since 2018 | 15:27 |
fungi | not to apologize for them, but i don't expect the board members come to those meetings expecting to make commitments on the behalf of their employers, and probably don't control the budget that assistance would be provided out of in most cases (they're often in entirely separate business units), so they have to lobby internally for that sort of thing | 15:27 |
jungleboyj | fungi: True. | 15:28 |
fungi | i'm more disappointed by the years of inaction than in their inability to make any immediate promises | 15:28 |
gmann | few of the suggestion are listed in the slide#5 #link https://docs.google.com/presentation/d/1ugdwMI2ZM2L8z1sobzHJwDpbvlyWKH02PH7Fi4tkyVc/edit#slide=id.ge1bdf71dac_0_24 | 15:28 |
gmann | that was my expecttion and hope. I know those are not easy but in current situation we need such support | 15:28 |
gmann | anyways that is update from Board meeting. moving next.. | 15:30 |
gmann | Creating a timeline for shutting the service down if help isn't found | 15:30 |
gmann | clarkb please go ahead | 15:31 |
clarkb | This is mostly a request that we start thinkign about what the timeline looks like if we don't end up with help to update the system or host it somewhere else | 15:31 |
clarkb | I'm not currently in a rush to shut it down, but there is a risk that external circumstances could force that to be done (security concerns or similar) | 15:32 |
clarkb | However, I think it would be good to have some agreement on what not a rush means :) | 15:32 |
jungleboyj | :-( | 15:32 |
clarkb | part of the reason this came up was after a week or two it was noticed that he cluster had completely crashed and I had to go resurrect it | 15:32 |
clarkb | I don't want to do that indefinitely if there isn't proper care and feeding happening | 15:33 |
clarkb | There are also a few problems with indexing currently including the massive log files generated by unittests due to warnings and for some reason logsatsh is emitting events for centuries in the future which floods the elasticsaerch cluster with indexes for the future | 15:33 |
clarkb | I think the massive log files led to the cluster crashing. The future events problem is more annoying than anything else | 15:34 |
gmann | yeah, we should start fixing those warning.may be we can ask all projects on ML. I can fix for oslo policy but do nto have bandwidth to fix other | 15:34 |
gmann | back to shutdown thing | 15:35 |
gmann | so if we shutdown, bug question is how we are going to debug the failure or how much it will add extra load on gate in term of doing recheck .. | 15:35 |
clarkb | gmann: to be fair I think most people just recheck anyway and don't do debugging | 15:35 |
gmann | yeah but not all | 15:35 |
gmann | after shutdown there will be many recheck we had to do | 15:35 |
clarkb | where elastic-recheck has been particularly useful is when you have an sdague, jogo, mtreinish, melwitt, or dansmith digging into braoder failures and trying to address them | 15:35 |
dansmith | yeah, I try to shame people that just blindly recheck, | 15:36 |
dansmith | but it's a bit of a losing battle | 15:36 |
dansmith | still, removing the *ability* to do real checking sucks :/ | 15:36 |
clarkb | I suspect the biggest impact will not be recheck problems but the once a cycle or so fix a very unstable gate | 15:36 |
dansmith | ...or a more continuously unstable gate | 15:36 |
gmann | yeah | 15:36 |
clarkb | ya | 15:36 |
gmann | which will directly impact our release | 15:37 |
gmann | or feature implementation | 15:37 |
fungi | though it sounds like the entire cluster was broken for a couple weeks there before anyone noticed it wasn't returning results to their queries | 15:37 |
clarkb | I think that also is part of why it has been so hard to find help for this. When it is a tool you use every 6 months it is less in your mind continuously for care and feeding | 15:37 |
clarkb | fungi: yes, but no one notices if the gate is stable | 15:37 |
dansmith | yeah | 15:37 |
clarkb | which is a big underlying issue here imo | 15:38 |
clarkb | people do notice when there are systemic problems in the gate that need addressing | 15:38 |
clarkb | another reason to have a rough timeline is it may help light a fire under people willing to help | 15:39 |
clarkb | when I brought this up last week gmann suggested the end of the Yoga cycle as a potential deadline | 15:39 |
dansmith | yeah, "no rush" is not as motivating | 15:39 |
gmann | yeah, I am thinking end of Yoga will be like more than 6 month we called it as last critical call for help | 15:40 |
clarkb | That ensures that Xena (hopefully) doesn't have any major changes to the stabilzation process. Then in Yoga we can start planning for replacement/shutdown/etc (though that can start earlier too) | 15:40 |
gmann | so if there is anyone want to help, should be raising hand by than | 15:40 |
clarkb | That timeline seems reasonable to me | 15:40 |
gmann | any objection on above deadline ? | 15:41 |
fungi | "no rush" has also been tempered by "but might be tomorrow, depending on outside factors" | 15:41 |
clarkb | fungi: yes and I think that is still the message from me | 15:41 |
dansmith | I'm not happy about the timeline, but accept the need | 15:41 |
dansmith | "happy with, not happy about" you might say :) | 15:42 |
gmann | dansmith: means? it too late or early ? | 15:42 |
clarkb | if we notice abuse of elasticsearch or lgostash that requires upgrades to address we'll be in a situation where we don't have much choice | 15:42 |
jungleboyj | I think that sounds like a reasonable timeline ... even though we don't want one. | 15:42 |
dansmith | gmann: it's me being intentionally vague. I'm good with it, just not happy about it.. necessary, but I worry about the inevitable end where nobody has actually stepped up | 15:42 |
gmann | clarkb: yeah for outside factors we would not be able to do anything and shutdown early ? | 15:42 |
clarkb | gmann: correct | 15:43 |
gmann | k | 15:43 |
gmann | dansmith: correct. my last hope was Board on paid resource but anyways that did not happen | 15:43 |
clarkb | Another concern is the sheer size of the system. I've temporarily shut down 50% of the indexing pipeline and have been monitoring our indexing queue https://grafana.opendev.org/d/5Imot6EMk/zuul-status?viewPanel=17&orgId=1&from=now-24h&to=now | 15:43 |
clarkb | compared to elasticsearch the logstash workers aren't huge but it is still something. I think I may turn on 10% again and leave it at 40% shutdown for another week then turn off the extra servers if that looks stable. | 15:44 |
clarkb | currently we seem to be just barely keeping up with demand | 15:44 |
fungi | yeah, that's just half the indexing workers, not half the system | 15:44 |
clarkb | (and then having some headroom for feature freeze is a good idea hence only reducing by 40% total) | 15:45 |
gmann | how about keeping only check pipeline logs ? | 15:45 |
clarkb | gmann: I would probably to the opposite and only keep gate | 15:45 |
clarkb | check is too noisy | 15:45 |
clarkb | people push a lot of broken into check :) | 15:45 |
jungleboyj | clarkb: That makes sense to me. | 15:45 |
gmann | clarkb: yeah but in check we do most of debugging and make it more stable till gate | 15:46 |
fungi | the check pipeline results are full of noise failures from bad changes, while the gate pipeline should in theory be things which at least got through check and code review to approval | 15:46 |
clarkb | but that is another option and reducing the total amount of logs indexed would potentially allows us to remove an elasticsearch server or two (since the major factor there is total storage size) | 15:46 |
clarkb | gmann: yes, but it is very hard to see anything useful in check because you can't really tell if things are just broken because someone didn't run tox locally or if they are really broken | 15:46 |
gmann | yeah | 15:47 |
clarkb | it is still useful to have check, often you want to go and see where something may have been introduced and you can trace that back to check | 15:47 |
clarkb | but if we start trimming logs check is what I would drop first | 15:47 |
clarkb | as far as elasticsaerch disk consumption goes we should have a pretty good idnicate of current db size for 7 days of indexes at the beginning of next week | 15:47 |
clarkb | the data is currently a bit off since we had the cluster crash recently | 15:48 |
clarkb | that info is available in our cacti instance if you want to see what usage looks like. We have 6TB storage available but 5TB useable beacuse we need to be tolerate to losing one server and its 1TB of disk | 15:48 |
clarkb | If we want t ostart pruning logs out then maybe we start that conversation next week when we have a good baseline of data to look at first | 15:49 |
gmann | or truncate the log storage time? to 2-3 days | 15:49 |
clarkb | yes that is another option | 15:49 |
fungi | though that doesn't give you much history to be able to identify when a particular failure started | 15:50 |
fungi | a week is already fairly short in that regard | 15:50 |
clarkb | yup, but may be enough to identify the source of problems and then work backward in code | 15:50 |
gmann | yeah, we anyways going to loose that | 15:50 |
clarkb | as well as track what issues are still occuring | 15:50 |
gmann | yes | 15:50 |
clarkb | anyway I think discussion for pruning elasticsearch size is better next week when we have better data to look at. I'm happy to help collect some of that info together and discuss it further next week if we like | 15:51 |
fungi | i wonder if we could change the indexing threshold to >info instead of >debug | 15:51 |
clarkb | (this is about all I had on this agenda item. I'll go ahead and make note of the Yoga daedline on the mailing list in a response to the thread I started a while back if I can find it) | 15:51 |
gmann | clarkb: +1 that will be even better | 15:51 |
clarkb | fungi: the issue with that is a good chunk of logs are the job-output.txt files now with no log level | 15:52 |
gmann | clarkb: +1 and thanks for publishing deadline on ML | 15:52 |
clarkb | fungi: this is why the warnings hurt so much | 15:52 |
fungi | ahh, yeah good point | 15:52 |
gmann | on warnings I will start a thread to fix and start onverting them to error at from openstack lib side so that proejct had tp fix them | 15:52 |
gmann | #action clarkb to convey the ELK service shutdown deadline on ML | 15:53 |
gmann | #action gmann to send ML to fix warning and oslo side changes to convert them to error | 15:53 |
gmann | and we will continue disscussing it in next week | 15:54 |
gmann | thanks clarkb fungi for the updates and maintaining these services | 15:54 |
jungleboyj | ++ | 15:54 |
gmann | #topic Open Reviews | 15:54 |
gmann | #link https://review.opendev.org/q/projects:openstack/governance+is:open | 15:54 |
gmann | I added the link for Yoga release name announcement | 15:55 |
gmann | please review that | 15:55 |
gmann | also Yoga testing runtime #link https://review.opendev.org/c/openstack/governance/+/799927 | 15:55 |
gmann | with no change from what we have in Xena | 15:55 |
gmann | and this one about rbac goal proposal #link https://review.opendev.org/c/openstack/governance/+/799705 | 15:56 |
gmann | and need one more vote in this project-update #link https://review.opendev.org/c/openstack/governance/+/799817 | 15:56 |
clarkb | as a note on the python version available in focal I think 3.9 is avaible now | 15:57 |
spotz | 3.9 is also what will be Stream 9 | 15:57 |
clarkb | oh I guess it is in universe though | 15:57 |
clarkb | probably good to test it but not make it the default | 15:57 |
gmann | clarkb: I think 3.8 | 15:57 |
gmann | clarkb: we have unit test job as non voting for 3.9 | 15:57 |
clarkb | gmann: it has both. But 3.8 is the default and not in universe :) | 15:58 |
gmann | yeah, default | 15:58 |
gmann | that's all for me today, anything else to diuscuss ? | 15:59 |
gmann | though 1 min left | 15:59 |
jungleboyj | Nothing here. | 16:00 |
gmann | if nothing, let's close meeting. | 16:00 |
gmann | k | 16:00 |
gmann | thanks all for joining. | 16:00 |
gmann | #endmeeting | 16:00 |
opendevmeet | Meeting ended Thu Jul 8 16:00:16 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/tc/2021/tc.2021-07-08-15.00.html | 16:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/tc/2021/tc.2021-07-08-15.00.txt | 16:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/tc/2021/tc.2021-07-08-15.00.log.html | 16:00 |
spotz | Thanks gmann | 16:00 |
ricolin | thanks gmann | 16:00 |
jungleboyj | Thank you! | 16:00 |
*** pojadhav is now known as pojadhav|away | 16:52 | |
opendevreview | Ghanshyam proposed openstack/governance-sigs master: Moving IRC network reference to OFTC https://review.opendev.org/c/openstack/governance-sigs/+/800135 | 23:39 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!