Thursday, 2025-03-06

dviroelhi all o/  watcher meeting will start in 511:55
dviroel#startmeeting watcher12:00
opendevmeetMeeting started Thu Mar  6 12:00:38 2025 UTC and is due to finish in 60 minutes.  The chair is dviroel. Information about MeetBot at http://wiki.debian.org/MeetBot.12:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.12:00
opendevmeetThe meeting name has been set to 'watcher'12:00
dviroelhi all, who's around today? 12:01
rlandyo/12:01
marioso/12:01
mtemboo/12:01
jgilabero/12:01
amoralejo/12:01
dviroelo/  thanks for joining12:02
dviroellet's start with today's meeting agenda12:02
dviroel#link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting (Meeting agenda)12:02
dviroelplease feel free to add your own topics to the agenda if you want to highlight something :) 12:02
dviroelfirst one12:03
dviroel#topic Feature Freeze12:03
dviroel#link https://releases.openstack.org/epoxy/schedule.html (2025.1 Epoxy Release Schedule)12:03
dviroelwe are now in feature freeze period12:03
dviroelmeaning that no new features, configurations changes or even string changes should be merged by now, or at least avoided12:04
dviroelwe also don't have any FFE requests by now, so we should be ok for next week12:05
dviroelwhich is RC1 target12:05
chandankumaro/12:06
dviroellet me highlight some patches from sean-k-mooney 12:06
dviroel#link https://review.opendev.org/c/openstack/watcher/+/943206 (Add epoxy prelude)12:06
dviroeland12:07
dviroel#link https://review.opendev.org/c/openstack/releases/+/943489 (Add cycle highlights for watcher in 2025.1)12:07
dviroelthanks sean-k-mooney for proposing these patches12:08
sean-k-mooney o/12:08
dviroelif you have any comment or addition to the epoxy prelude, pls add your review there12:08
sean-k-mooneyi would like to merge the prelude patch early next week12:09
sean-k-mooneyi.e. monday12:09
sean-k-mooneyso that its included in RC112:09
sean-k-mooneyso please provide any feedback before then12:09
dviroel+112:09
dviroelany other comments for this topic?12:10
dviroelok, next one12:11
dviroel#topic (rlandy) we need to sort/prioritize PTG topics12:11
rlandythanks dviroel 12:12
rlandywe have a lot of topics on the etherpad12:12
rlandysome of which may need sessions with other DFGs12:12
rlandyor groups I should say12:12
rlandyso proposing we prioritize this list ... on line or async12:13
rlandywhatever the group chooses12:13
rlandyie: we could do this by voting or meeting so that we can settle on a list and request the meeting slots12:13
mariosi added a sub-point under your agenda item which is related Ronelle12:13
rlandymarios - pls go ahead then12:13
marioswe need to book slots... so perhaps we can book the times, and then folks can start to request particular slots of they want them12:14
mariosi got an email about the need for us to choose meeting slots (since I registered the team a few weeks back and they needed a specific contact)12:14
rlandywell we don't want a chicken/egg situation with slots and topics12:14
rlandyone needs to be sorted first 12:14
marioswe need to select slots in https://ptg.opendev.org/ptg.html12:14
mariosi would propose we book 1300 to 1600 UTC on mon/tue/wed ? ... then we have specific times sorted if it works (we can discuss the proposal but i think that roughly lines up with the folks that have been active thus far at least)12:15
marioswe can always cancel slots of we don't use them 12:15
marios#info etherpads are live https://ptg.opendev.org/etherpads.html12:16
marios#link https://etherpad.opendev.org/p/apr2025-ptg-watcher12:16
mariosso any objections to 1300-1600 UTC for mon/tue/wed? 12:17
mariosotherwise i will book those12:17
dviroelworks for me12:18
mariosrlandy: for the topics. either one person (or few) can try and group these and then place them into 'slots' (eg we can vary the time but something like 45 mins per topic to allow 10 mins break or whatever way we decide to split it), OR12:19
chandankumarabove timing works for me too12:19
jgilabersounds good12:19
amoralejwfm, i think it's probably more than we need, but if we can cancel, no problem12:19
mariosrlandy: OR, we ask the folks who added each of those topics, to add them into the PTG etherpad and we take them in order/as listed 12:19
dviroelsean-k-mooney: not sure if we can avoid confluicts with nova, since there are slots in all days :) 12:19
mariosamoralej: yes exactly better to book and not use it12:19
mariosalso if we don't book we can still decide to meet ;)12:19
sean-k-mooneydviroel: there will be conflict but i think that will be managable12:20
mariosit just means it wont be listed and available to anyone that wants to follow12:20
amoralejthen wfm12:20
sean-k-mooneyim not sure either team will need the full slots on all days12:20
rlandymarios: fine with either - as long as we have a system that works to get the most out of the meetings12:20
sean-k-mooneyi do know that nova was considerign mon-weds also becasue some core are not around later in the week or also part fo the tc12:21
mariosi'm wondering12:21
mariossean-k-mooney: should we book  thu too, in case we want to schedule something with other teams who are busy on mon-wed12:22
mariosbut i don't know if anyone has plans to invite say nova folks 12:22
sean-k-mooneyi was planning to attend there seesion but we coudl do it the other way around12:22
sean-k-mooneylets stick with the current schduler for now and i can reach out to them12:23
sean-k-mooneythe other team i think we shoudl try and sync with is horizon and or telemetry12:23
sean-k-mooneyim not sure if we will have topic for either but we should look at when they will have there seesion and see if we can organise cross project topic if they are relevent12:23
dviroelok, so we agree with initial marios slot proposal12:24
mariosthanks dviroel i will book mon-wed 1300-1600 slots 12:24
dviroel#action marios to book ptg slots: mon-wed 1300-160012:24
mariosrlandy: i guess we will re-visit topics next week? or between now and then 12:25
rlandymarios: ack - sure12:25
rlandypeople can vote or self order in the mean time12:25
rlandythank you dviroel  - that is all from my side12:26
dviroelack, thanks for bringing this topic12:26
dviroelalso tks marios o/12:26
dviroelok, lets move on12:27
dviroel#topic Bug Triage12:27
dviroelfirst one12:27
dviroelhttps://bugs.launchpad.net/watcher/+bug/209837412:28
dviroelthere are kind of 2 issues in there12:29
dviroelone with the audit12:29
dviroelother one with the strategy itself?12:29
jgilaberthe first issue could maybe be related to the evenlet issue?12:31
amoralejworkload_cache is missing info about this instance.uuid apparently12:33
jgilaberthe second issue seems to happen in https://github.com/openstack/watcher/blob/77a30ef28140ec6c7748153a733b06d5d5ea55df/watcher/decision_engine/strategy/strategies/workload_balance.py#L16212:33
amoralejmay also be related to sync between model and something else ...12:33
jgilaberthe workload_cache is generated by https://github.com/openstack/watcher/blob/77a30ef28140ec6c7748153a733b06d5d5ea55df/watcher/decision_engine/strategy/strategies/workload_balance.py#L20612:34
dviroelan exception in the strategy will move the audit to failed, and from there the audit can be deleted only12:34
jgilaberthere is some logging in that method, more logs could maybe tell us more12:34
dviroelit seems that is working as expected and documented12:35
sean-k-mooneyaudits are not mutable today12:35
dviroel#link https://docs.openstack.org/watcher/latest/_images/audit_state_machine.png12:36
amoralejeverything seems to be coming from the model ...12:36
sean-k-mooneyso if there is an exption i think its expeed to move to failed and then only delete or archive would make sense12:36
sean-k-mooneyso ya i think this is expected12:36
dviroelthe issue is really the workload_balance KeyError exception12:37
dviroelbut it is not what was reported in this bug12:37
sean-k-mooneywell i think there are two things12:37
sean-k-mooneyif there is an unrecoverable error its expect to go to fiald12:38
sean-k-mooneybut if we are not properly handleing effectivly "instnace not found"12:38
sean-k-mooneythat is a logic bug12:38
amoralejbut it's continuous, it should keep as ongoing and run on next planned execution12:39
sean-k-mooneyyes and no. really continous audit shoudl be two seperate concepts12:39
sean-k-mooneyin the api we should have a disticntion between teh trigger/schduler and the actual audit12:40
sean-k-mooneysicne we dont have that distinction today12:40
sean-k-mooneyit would be valid for the contiuse audit type to perhasp reset to pending 12:41
sean-k-mooneywhen the next interval elaspses12:41
sean-k-mooneybut fundementely this is a design flaw in teh api IMO12:41
amoraleji doubt if pending or ongoing, tbh, once an audit is assigned into a decision-engine it's ongoing, iiuc12:42
amoralejbut yeah, api is confusing12:42
sean-k-mooneywe are currently conflating when to do an audit and what to audit12:42
sean-k-mooneyallowing a transition ot ongoing may be valid but im not sure its a bug12:44
sean-k-mooneyits a semantic api change so i dont think this would be backportable12:44
sean-k-mooneywe could convert this into a feature12:45
dviroel+112:45
sean-k-mooneyand add a "recoverable property" to the auit to opt in to requeing on failure perhapse with an upper limit on retrys without success12:45
amoralejso you mean in having a separate api element to represent audit execution ?12:45
sean-k-mooneyi.e. add retry=512:46
sean-k-mooneyamoralej: well that another option yes12:46
amoralejso, audit_execution would be FAILED, but continous AUDIT would stay ONGOING meaning it will do a new execution according to schedule12:46
dviroeli would say that the current bug report is invalid, and a RFE should be created12:46
dviroeland maybe a new bug for the workload_balance should also be open12:47
sean-k-mooneywe could triage this as wishlist, with the rfe tag and say we should adress this with a spec12:47
amoralejsomething interesting would be max_failures for an continous audit12:47
sean-k-mooneyamoralej: thats what retry=5 was intened to be12:47
amoralejwhich is similar to that, yp12:48
sean-k-mooneydviroel: i woudl suggest markign this as opipion, whishlist and add rfe to the tags if no one objects ?12:48
sean-k-mooneydviroel: and perhasp have another bug for the sepcific key error12:49
dviroelack, I can also add meeting logs to the bug12:49
sean-k-mooneyas you suggested12:49
amoralejand wrt the KeyError issue, i don't know how it hitted that, tbh12:49
amoralej+1 to split in two12:50
sean-k-mooneywell the instance was likely deleted12:50
amoralejbut it's getting the data from the model not from live nova12:50
sean-k-mooneyits getting a mix in some case i think12:50
amoralejso, unless it deleted and resynced the model in the middle of the execution ...12:50
amoralejor I may be misreading the code12:51
sean-k-mooneythe key error is proably a valid bug to go fix12:51
dviroelyep, requires more investigation12:51
dviroellets move on, I can do the split after the meeting, and update the current one12:52
dviroel#link https://bugs.launchpad.net/watcher-tempest-plugin/+bug/210074112:52
sean-k-mooneyack12:52
dviroelchandankumar: is working on this 12:52
chandankumardviroel: I have opened this one. functional tests should live in python-watcherclient12:52
chandankumarI have all the reviews up and ci jobs are passing.12:53
dviroelchandankumar: ack, can you update the bug report? with status, assignee, progress12:53
chandankumarsure12:54
dviroelany other concern on this one?12:54
dviroelok, just one more for today12:55
dviroel#link https://bugs.launchpad.net/watcher-tempest-plugin/+bug/209085312:55
dviroelsean-k-mooney: ^ do you think that we already covered this one? or there is place for more?12:55
dviroeljust fyi, we added support for prometheus in make_instance_statistic method12:57
dviroel#link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942141 (Add support for prometheus datasource in scenario tests)12:57
sean-k-mooneysorry got pinged for something else12:58
sean-k-mooneynot entirly12:59
sean-k-mooneywe have to some degree12:59
sean-k-mooneywe can proably close it12:59
sean-k-mooneybut i still think we need to refactor some of the test that dont need metrics to not use them12:59
dviroelI can add more info to the bug, wrt to the changes merged recently12:59
dviroelack, agree13:00
sean-k-mooneythe hardcoding has been adressed13:00
sean-k-mooneyso i think we can likely close this and file a sepreate bug for "tests that inject metrics they do not use"13:00
dviroelsean-k-mooney: ack13:00
* dviroel time check13:00
dviroelgoing to skip the next topic, since we already have a volunteer to chair next week meeting13:01
dviroelthanks mtembo fo/13:01
dviroellet's wrap up for today13:01
dviroelthank you all for participating13:01
dviroel#endmeeting13:02
opendevmeetMeeting ended Thu Mar  6 13:02:02 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)13:02
opendevmeetMinutes:        https://meetings.opendev.org/meetings/watcher/2025/watcher.2025-03-06-12.00.html13:02
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/watcher/2025/watcher.2025-03-06-12.00.txt13:02
opendevmeetLog:            https://meetings.opendev.org/meetings/watcher/2025/watcher.2025-03-06-12.00.log.html13:02
amoralejthanks dviroel !13:02
opendevreviewDouglas Viroel proposed openstack/watcher master: Ignore node_exporter cpu metrics in prometheus job  https://review.opendev.org/c/openstack/watcher/+/94363720:04
*** haleyb is now known as haleyb|out22:48

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!