dviroel | hi all o/ watcher meeting will start in 5 | 11:55 |
---|---|---|
dviroel | #startmeeting watcher | 12:00 |
opendevmeet | Meeting started Thu Mar 6 12:00:38 2025 UTC and is due to finish in 60 minutes. The chair is dviroel. Information about MeetBot at http://wiki.debian.org/MeetBot. | 12:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 12:00 |
opendevmeet | The meeting name has been set to 'watcher' | 12:00 |
dviroel | hi all, who's around today? | 12:01 |
rlandy | o/ | 12:01 |
marios | o/ | 12:01 |
mtembo | o/ | 12:01 |
jgilaber | o/ | 12:01 |
amoralej | o/ | 12:01 |
dviroel | o/ thanks for joining | 12:02 |
dviroel | let's start with today's meeting agenda | 12:02 |
dviroel | #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting (Meeting agenda) | 12:02 |
dviroel | please feel free to add your own topics to the agenda if you want to highlight something :) | 12:02 |
dviroel | first one | 12:03 |
dviroel | #topic Feature Freeze | 12:03 |
dviroel | #link https://releases.openstack.org/epoxy/schedule.html (2025.1 Epoxy Release Schedule) | 12:03 |
dviroel | we are now in feature freeze period | 12:03 |
dviroel | meaning that no new features, configurations changes or even string changes should be merged by now, or at least avoided | 12:04 |
dviroel | we also don't have any FFE requests by now, so we should be ok for next week | 12:05 |
dviroel | which is RC1 target | 12:05 |
chandankumar | o/ | 12:06 |
dviroel | let me highlight some patches from sean-k-mooney | 12:06 |
dviroel | #link https://review.opendev.org/c/openstack/watcher/+/943206 (Add epoxy prelude) | 12:06 |
dviroel | and | 12:07 |
dviroel | #link https://review.opendev.org/c/openstack/releases/+/943489 (Add cycle highlights for watcher in 2025.1) | 12:07 |
dviroel | thanks sean-k-mooney for proposing these patches | 12:08 |
sean-k-mooney | o/ | 12:08 |
dviroel | if you have any comment or addition to the epoxy prelude, pls add your review there | 12:08 |
sean-k-mooney | i would like to merge the prelude patch early next week | 12:09 |
sean-k-mooney | i.e. monday | 12:09 |
sean-k-mooney | so that its included in RC1 | 12:09 |
sean-k-mooney | so please provide any feedback before then | 12:09 |
dviroel | +1 | 12:09 |
dviroel | any other comments for this topic? | 12:10 |
dviroel | ok, next one | 12:11 |
dviroel | #topic (rlandy) we need to sort/prioritize PTG topics | 12:11 |
rlandy | thanks dviroel | 12:12 |
rlandy | we have a lot of topics on the etherpad | 12:12 |
rlandy | some of which may need sessions with other DFGs | 12:12 |
rlandy | or groups I should say | 12:12 |
rlandy | so proposing we prioritize this list ... on line or async | 12:13 |
rlandy | whatever the group chooses | 12:13 |
rlandy | ie: we could do this by voting or meeting so that we can settle on a list and request the meeting slots | 12:13 |
marios | i added a sub-point under your agenda item which is related Ronelle | 12:13 |
rlandy | marios - pls go ahead then | 12:13 |
marios | we need to book slots... so perhaps we can book the times, and then folks can start to request particular slots of they want them | 12:14 |
marios | i got an email about the need for us to choose meeting slots (since I registered the team a few weeks back and they needed a specific contact) | 12:14 |
rlandy | well we don't want a chicken/egg situation with slots and topics | 12:14 |
rlandy | one needs to be sorted first | 12:14 |
marios | we need to select slots in https://ptg.opendev.org/ptg.html | 12:14 |
marios | i would propose we book 1300 to 1600 UTC on mon/tue/wed ? ... then we have specific times sorted if it works (we can discuss the proposal but i think that roughly lines up with the folks that have been active thus far at least) | 12:15 |
marios | we can always cancel slots of we don't use them | 12:15 |
marios | #info etherpads are live https://ptg.opendev.org/etherpads.html | 12:16 |
marios | #link https://etherpad.opendev.org/p/apr2025-ptg-watcher | 12:16 |
marios | so any objections to 1300-1600 UTC for mon/tue/wed? | 12:17 |
marios | otherwise i will book those | 12:17 |
dviroel | works for me | 12:18 |
marios | rlandy: for the topics. either one person (or few) can try and group these and then place them into 'slots' (eg we can vary the time but something like 45 mins per topic to allow 10 mins break or whatever way we decide to split it), OR | 12:19 |
chandankumar | above timing works for me too | 12:19 |
jgilaber | sounds good | 12:19 |
amoralej | wfm, i think it's probably more than we need, but if we can cancel, no problem | 12:19 |
marios | rlandy: OR, we ask the folks who added each of those topics, to add them into the PTG etherpad and we take them in order/as listed | 12:19 |
dviroel | sean-k-mooney: not sure if we can avoid confluicts with nova, since there are slots in all days :) | 12:19 |
marios | amoralej: yes exactly better to book and not use it | 12:19 |
marios | also if we don't book we can still decide to meet ;) | 12:19 |
sean-k-mooney | dviroel: there will be conflict but i think that will be managable | 12:20 |
marios | it just means it wont be listed and available to anyone that wants to follow | 12:20 |
amoralej | then wfm | 12:20 |
sean-k-mooney | im not sure either team will need the full slots on all days | 12:20 |
rlandy | marios: fine with either - as long as we have a system that works to get the most out of the meetings | 12:20 |
sean-k-mooney | i do know that nova was considerign mon-weds also becasue some core are not around later in the week or also part fo the tc | 12:21 |
marios | i'm wondering | 12:21 |
marios | sean-k-mooney: should we book thu too, in case we want to schedule something with other teams who are busy on mon-wed | 12:22 |
marios | but i don't know if anyone has plans to invite say nova folks | 12:22 |
sean-k-mooney | i was planning to attend there seesion but we coudl do it the other way around | 12:22 |
sean-k-mooney | lets stick with the current schduler for now and i can reach out to them | 12:23 |
sean-k-mooney | the other team i think we shoudl try and sync with is horizon and or telemetry | 12:23 |
sean-k-mooney | im not sure if we will have topic for either but we should look at when they will have there seesion and see if we can organise cross project topic if they are relevent | 12:23 |
dviroel | ok, so we agree with initial marios slot proposal | 12:24 |
marios | thanks dviroel i will book mon-wed 1300-1600 slots | 12:24 |
dviroel | #action marios to book ptg slots: mon-wed 1300-1600 | 12:24 |
marios | rlandy: i guess we will re-visit topics next week? or between now and then | 12:25 |
rlandy | marios: ack - sure | 12:25 |
rlandy | people can vote or self order in the mean time | 12:25 |
rlandy | thank you dviroel - that is all from my side | 12:26 |
dviroel | ack, thanks for bringing this topic | 12:26 |
dviroel | also tks marios o/ | 12:26 |
dviroel | ok, lets move on | 12:27 |
dviroel | #topic Bug Triage | 12:27 |
dviroel | first one | 12:27 |
dviroel | https://bugs.launchpad.net/watcher/+bug/2098374 | 12:28 |
dviroel | there are kind of 2 issues in there | 12:29 |
dviroel | one with the audit | 12:29 |
dviroel | other one with the strategy itself? | 12:29 |
jgilaber | the first issue could maybe be related to the evenlet issue? | 12:31 |
amoralej | workload_cache is missing info about this instance.uuid apparently | 12:33 |
jgilaber | the second issue seems to happen in https://github.com/openstack/watcher/blob/77a30ef28140ec6c7748153a733b06d5d5ea55df/watcher/decision_engine/strategy/strategies/workload_balance.py#L162 | 12:33 |
amoralej | may also be related to sync between model and something else ... | 12:33 |
jgilaber | the workload_cache is generated by https://github.com/openstack/watcher/blob/77a30ef28140ec6c7748153a733b06d5d5ea55df/watcher/decision_engine/strategy/strategies/workload_balance.py#L206 | 12:34 |
dviroel | an exception in the strategy will move the audit to failed, and from there the audit can be deleted only | 12:34 |
jgilaber | there is some logging in that method, more logs could maybe tell us more | 12:34 |
dviroel | it seems that is working as expected and documented | 12:35 |
sean-k-mooney | audits are not mutable today | 12:35 |
dviroel | #link https://docs.openstack.org/watcher/latest/_images/audit_state_machine.png | 12:36 |
amoralej | everything seems to be coming from the model ... | 12:36 |
sean-k-mooney | so if there is an exption i think its expeed to move to failed and then only delete or archive would make sense | 12:36 |
sean-k-mooney | so ya i think this is expected | 12:36 |
dviroel | the issue is really the workload_balance KeyError exception | 12:37 |
dviroel | but it is not what was reported in this bug | 12:37 |
sean-k-mooney | well i think there are two things | 12:37 |
sean-k-mooney | if there is an unrecoverable error its expect to go to fiald | 12:38 |
sean-k-mooney | but if we are not properly handleing effectivly "instnace not found" | 12:38 |
sean-k-mooney | that is a logic bug | 12:38 |
amoralej | but it's continuous, it should keep as ongoing and run on next planned execution | 12:39 |
sean-k-mooney | yes and no. really continous audit shoudl be two seperate concepts | 12:39 |
sean-k-mooney | in the api we should have a disticntion between teh trigger/schduler and the actual audit | 12:40 |
sean-k-mooney | sicne we dont have that distinction today | 12:40 |
sean-k-mooney | it would be valid for the contiuse audit type to perhasp reset to pending | 12:41 |
sean-k-mooney | when the next interval elaspses | 12:41 |
sean-k-mooney | but fundementely this is a design flaw in teh api IMO | 12:41 |
amoralej | i doubt if pending or ongoing, tbh, once an audit is assigned into a decision-engine it's ongoing, iiuc | 12:42 |
amoralej | but yeah, api is confusing | 12:42 |
sean-k-mooney | we are currently conflating when to do an audit and what to audit | 12:42 |
sean-k-mooney | allowing a transition ot ongoing may be valid but im not sure its a bug | 12:44 |
sean-k-mooney | its a semantic api change so i dont think this would be backportable | 12:44 |
sean-k-mooney | we could convert this into a feature | 12:45 |
dviroel | +1 | 12:45 |
sean-k-mooney | and add a "recoverable property" to the auit to opt in to requeing on failure perhapse with an upper limit on retrys without success | 12:45 |
amoralej | so you mean in having a separate api element to represent audit execution ? | 12:45 |
sean-k-mooney | i.e. add retry=5 | 12:46 |
sean-k-mooney | amoralej: well that another option yes | 12:46 |
amoralej | so, audit_execution would be FAILED, but continous AUDIT would stay ONGOING meaning it will do a new execution according to schedule | 12:46 |
dviroel | i would say that the current bug report is invalid, and a RFE should be created | 12:46 |
dviroel | and maybe a new bug for the workload_balance should also be open | 12:47 |
sean-k-mooney | we could triage this as wishlist, with the rfe tag and say we should adress this with a spec | 12:47 |
amoralej | something interesting would be max_failures for an continous audit | 12:47 |
sean-k-mooney | amoralej: thats what retry=5 was intened to be | 12:47 |
amoralej | which is similar to that, yp | 12:48 |
sean-k-mooney | dviroel: i woudl suggest markign this as opipion, whishlist and add rfe to the tags if no one objects ? | 12:48 |
sean-k-mooney | dviroel: and perhasp have another bug for the sepcific key error | 12:49 |
dviroel | ack, I can also add meeting logs to the bug | 12:49 |
sean-k-mooney | as you suggested | 12:49 |
amoralej | and wrt the KeyError issue, i don't know how it hitted that, tbh | 12:49 |
amoralej | +1 to split in two | 12:50 |
sean-k-mooney | well the instance was likely deleted | 12:50 |
amoralej | but it's getting the data from the model not from live nova | 12:50 |
sean-k-mooney | its getting a mix in some case i think | 12:50 |
amoralej | so, unless it deleted and resynced the model in the middle of the execution ... | 12:50 |
amoralej | or I may be misreading the code | 12:51 |
sean-k-mooney | the key error is proably a valid bug to go fix | 12:51 |
dviroel | yep, requires more investigation | 12:51 |
dviroel | lets move on, I can do the split after the meeting, and update the current one | 12:52 |
dviroel | #link https://bugs.launchpad.net/watcher-tempest-plugin/+bug/2100741 | 12:52 |
sean-k-mooney | ack | 12:52 |
dviroel | chandankumar: is working on this | 12:52 |
chandankumar | dviroel: I have opened this one. functional tests should live in python-watcherclient | 12:52 |
chandankumar | I have all the reviews up and ci jobs are passing. | 12:53 |
dviroel | chandankumar: ack, can you update the bug report? with status, assignee, progress | 12:53 |
chandankumar | sure | 12:54 |
dviroel | any other concern on this one? | 12:54 |
dviroel | ok, just one more for today | 12:55 |
dviroel | #link https://bugs.launchpad.net/watcher-tempest-plugin/+bug/2090853 | 12:55 |
dviroel | sean-k-mooney: ^ do you think that we already covered this one? or there is place for more? | 12:55 |
dviroel | just fyi, we added support for prometheus in make_instance_statistic method | 12:57 |
dviroel | #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942141 (Add support for prometheus datasource in scenario tests) | 12:57 |
sean-k-mooney | sorry got pinged for something else | 12:58 |
sean-k-mooney | not entirly | 12:59 |
sean-k-mooney | we have to some degree | 12:59 |
sean-k-mooney | we can proably close it | 12:59 |
sean-k-mooney | but i still think we need to refactor some of the test that dont need metrics to not use them | 12:59 |
dviroel | I can add more info to the bug, wrt to the changes merged recently | 12:59 |
dviroel | ack, agree | 13:00 |
sean-k-mooney | the hardcoding has been adressed | 13:00 |
sean-k-mooney | so i think we can likely close this and file a sepreate bug for "tests that inject metrics they do not use" | 13:00 |
dviroel | sean-k-mooney: ack | 13:00 |
* dviroel time check | 13:00 | |
dviroel | going to skip the next topic, since we already have a volunteer to chair next week meeting | 13:01 |
dviroel | thanks mtembo fo/ | 13:01 |
dviroel | let's wrap up for today | 13:01 |
dviroel | thank you all for participating | 13:01 |
dviroel | #endmeeting | 13:02 |
opendevmeet | Meeting ended Thu Mar 6 13:02:02 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 13:02 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/watcher/2025/watcher.2025-03-06-12.00.html | 13:02 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/watcher/2025/watcher.2025-03-06-12.00.txt | 13:02 |
opendevmeet | Log: https://meetings.opendev.org/meetings/watcher/2025/watcher.2025-03-06-12.00.log.html | 13:02 |
amoralej | thanks dviroel ! | 13:02 |
opendevreview | Douglas Viroel proposed openstack/watcher master: Ignore node_exporter cpu metrics in prometheus job https://review.opendev.org/c/openstack/watcher/+/943637 | 20:04 |
*** haleyb is now known as haleyb|out | 22:48 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!