opendevreview | Merged openstack/watcher-tempest-plugin master: Create a function for deleting old injected metrics on prometheus https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954433 | 03:31 |
---|---|---|
opendevreview | David proposed openstack/watcher-tempest-plugin master: Add custom flavor creation for RAM based stategy tests https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/953853 | 07:21 |
opendevreview | David proposed openstack/watcher-tempest-plugin master: Add custom flavor creation for RAM based stategy tests https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/953853 | 08:59 |
opendevreview | David proposed openstack/watcher master: Disable real metrics on devstack injected data jobs https://review.opendev.org/c/openstack/watcher/+/955281 | 10:50 |
opendevreview | David proposed openstack/watcher-tempest-plugin master: Add custom flavor creation for RAM based stategy tests https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/953853 | 10:51 |
opendevreview | Joan Gilabert proposed openstack/watcher master: Enable storage model collector by default https://review.opendev.org/c/openstack/watcher/+/951323 | 10:51 |
opendevreview | Joan Gilabert proposed openstack/watcher-tempest-plugin master: Add test for volume retype with zone migration https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954625 | 10:52 |
opendevreview | David proposed openstack/watcher-tempest-plugin master: Add custom flavor creation for RAM based stategy tests https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/953853 | 11:01 |
rlandy | hello all, Watcher IRC meeting here in 35 minutes ... please add any topics to: https://etherpad.opendev.org/p/openstack-watcher-irc-meeting. Thank you | 11:26 |
rlandy | #startmeeting Watcher IRC Meeting - July 17, 2025 | 12:00 |
opendevmeet | Meeting started Thu Jul 17 12:00:37 2025 UTC and is due to finish in 60 minutes. The chair is rlandy. Information about MeetBot at http://wiki.debian.org/MeetBot. | 12:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 12:00 |
opendevmeet | The meeting name has been set to 'watcher_irc_meeting___july_17__2025' | 12:00 |
rlandy | Hi all ... who is around? | 12:01 |
amoralej | o/ | 12:01 |
morenod | o/ | 12:01 |
rlandy | courtesy ping list: dviroel jgilaber sean-k-mooney | 12:01 |
jgilaber | o/ | 12:01 |
dviroel | o/ | 12:01 |
rlandy | chandankumar is away atm ... let's start | 12:03 |
rlandy | #topic: (chandan|not around today): Move workload_balance_strategy_cpu|ram tests to exclude list to unblock upstream watcher reviews | 12:03 |
rlandy | #link: https://launchpad.net/bugs/2116875 | 12:03 |
dviroel | right, it is failing more often now | 12:04 |
dviroel | so we can skip the tests while working on a fix | 12:04 |
dviroel | morenod is already investigating and knows what is happenning I think | 12:05 |
rlandy | any objections to merging this and reverting when the fix is in? | 12:05 |
sean-k-mooney | o/ | 12:06 |
amoralej | it's expected that, different runs of the same jobs have different node sizes? | 12:06 |
morenod | I've added a comment on the bug, the problem is that compute nodes are sometimes 8vcpus and sometimes 4vcpus... we need to create the test with a dynamic threshold, based on the number of vcpus of the compute nodes | 12:06 |
sean-k-mooney | im fine with skipping it for now the other way to do that is to use the skip_because decorator in the tempest plugin | 12:06 |
sean-k-mooney | that takes a bug ref | 12:06 |
sean-k-mooney | but im ok with the regex approch for now | 12:07 |
sean-k-mooney | skip_because is slightly less work | 12:07 |
morenod | Im also ok with skipping, I will need a few more days to have the fix | 12:07 |
sean-k-mooney | because it wil ksip everywhere | 12:07 |
sean-k-mooney | if there isnt a parch already i woudl prefer to use https://github.com/openstack/tempest/blob/master/tempest/lib/decorators.py#L60 | 12:08 |
rlandy | morenod: reading your comment, the fix will take some time? | 12:08 |
sean-k-mooney | you just add it like this https://github.com/openstack/tempest/blob/master/tempest/api/image/v2/admin/test_image_task.py#L98 | 12:09 |
dviroel | sean-k-mooney: yeah, it is preferable | 12:09 |
morenod | rlandy, im working on it now, maybe sometime between tomorrow and monday it will be ready | 12:09 |
morenod | I like the skip_because solution, it is very clear | 12:10 |
jgilaber | amoralej, I can't find the nodeset definitions, but it could be possible that different providers have a label with the same name but using different flavours | 12:11 |
rlandy | #action rlandy to contact chandankumar to review above suggestions while morenod finishes real fix | 12:11 |
amoralej | i guess that's what is happening, i thought there was a consensus about nodeset definitions | 12:11 |
sean-k-mooney | morenod: there is also a tempest cli command to list all the decorated test i belive so you can keep track of them over time | 12:11 |
amoralej | anyway, good to adjust the threshold to the actual node sizes | 12:12 |
sean-k-mooney | keep in mind the node size can differ upstream vs downstream and even in upstream | 12:12 |
morenod | related but not related to this issue, we disabled in the node_exporter in the watcher-operator, but not on devstack based jobs. I have created this review for that https://review.opendev.org/c/openstack/watcher/+/955281 | 12:12 |
sean-k-mooney | upstream we alwasy shoudl ahve at leat 8GB for ram but we can have 4 or 8 cpus depelnding on perfroamce | 12:12 |
dviroel | yes, we can run these tests anywhere, so it should be adjusted to node specs | 12:12 |
morenod | we will have dynamic flavors to fix RAM and dynamic threshold to fix CPU | 12:13 |
sean-k-mooney | that an approch and one tha t comptue has used to some success in whitebox but its not alwasy easy to do | 12:13 |
sean-k-mooney | but ok lets see what that looks ike | 12:14 |
rlandy | anything more on this topic? | 12:15 |
sean-k-mooney | crickets generally means we can move on :) | 12:16 |
rlandy | thank you for the input - will alert chandankumar to review the conversation | 12:16 |
rlandy | #topic: (dviroel) Eventlet Removal | 12:16 |
rlandy | dviroel, do you want to take this one? | 12:16 |
dviroel | yes | 12:17 |
dviroel | #link https://etherpad.opendev.org/p/watcher-eventlet-removal | 12:17 |
dviroel | the etherpad has links to the changes ready for review | 12:17 |
dviroel | i also added to the meeting etherpad | 12:17 |
dviroel | tl;dr; the decision engine changes are ready for review | 12:17 |
dviroel | there are other discussions that are not code related like | 12:18 |
dviroel | should we keep a prometheus-threading job as voting? | 12:18 |
dviroel | which we can discuss in the change itseld | 12:18 |
sean-k-mooney | hum | 12:19 |
sean-k-mooney | so i think we want to run with both version and pershpas start with it as non voting for now | 12:19 |
dviroel | in the same line, I added a new tox py3 job, to run a subset of tests with eventlet patching disabled | 12:20 |
sean-k-mooney | but if we are going to offially supprot both models in 2025.2 then we shoudl make it voting before m3 | 12:20 |
dviroel | #link https://review.opendev.org/c/openstack/watcher/+/955097 | 12:20 |
sean-k-mooney | what i would suggest is let start iwth as non voting and look to make the treadign jobs voting around the start of august | 12:21 |
dviroel | sean-k-mooney: right, I can add that as a task for m3, to move to voting | 12:21 |
dviroel | ans we can look at job's history | 12:21 |
sean-k-mooney | for the unit test job if its passing i woudl be more agressive with that and make it voting right away | 12:21 |
dviroel | ack, it is passing now, buy skipping 'applier' ones, which will be part of next effort, to add support to applier too | 12:22 |
sean-k-mooney | ya that what we are doing in nova as well | 12:22 |
sean-k-mooney | we have 75% of the unit test passing maybe higher | 12:23 |
sean-k-mooney | so we are using an exclude list to skip the failing ones and burning that down | 12:23 |
dviroel | nice | 12:23 |
sean-k-mooney | on https://review.opendev.org/c/openstack/watcher/+/952499/4 | 12:24 |
sean-k-mooney | 1 you rote it so it has an implict +2 | 12:24 |
sean-k-mooney | but i have also left it open now for about a week | 12:24 |
sean-k-mooney | so i was planning to +w it after the meeting if there were no other objects | 12:24 |
dviroel | ++ | 12:24 |
dviroel | i see no objections :) | 12:25 |
sean-k-mooney | by the way the watcher-prometheus-integration-threading job failed on the unit test patch which is partly why i want to keep it non-voting for a week or two ot make sure that not a regular thing | 12:25 |
dviroel | tks sean-k-mooney | 12:25 |
sean-k-mooney | oh it was just test_execute_workload_balance_strategy_cpu | 12:26 |
dviroel | sean-k-mooney: but failng | 12:26 |
dviroel | yeah | 12:26 |
sean-k-mooney | that the instablity we dicussed above | 12:26 |
dviroel | i was about to say that | 12:26 |
sean-k-mooney | ok well that a good sign | 12:26 |
dviroel | and the same issue can block the decision engine patch to merge too, just fyi | 12:27 |
dviroel | or trigger some rechecks | 12:27 |
dviroel | so maybe we could wait the skip if needed | 12:27 |
dviroel | lets see | 12:27 |
sean-k-mooney | ack, i may not have time to complete my review of the 2 later patche stoday but we can try to get those mergd somethime next week i think | 12:28 |
dviroel | ack | 12:28 |
dviroel | there is one more: | 12:28 |
dviroel | #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954264 | 12:28 |
dviroel | it adds a new scenario test, with continuous audit | 12:28 |
sean-k-mooney | ya that not really reventlet removal as such | 12:28 |
dviroel | there is a specific scenario that I wanted to test, which needs 2 audits to be created | 12:28 |
sean-k-mooney | just missing test coverage | 12:28 |
dviroel | ack | 12:29 |
dviroel | it is a scenario that fails when we move to threading mode | 12:29 |
sean-k-mooney | i see do you knwo why? | 12:29 |
sean-k-mooney | did you update the defautl executor for apsschduler | 12:30 |
sean-k-mooney | to not use green pools in your treadign patch | 12:30 |
dviroel | today continuous audit is started at Audit Endpoint constructor, before the main decision engine service fork | 12:31 |
dviroel | so this thread was running on a different process | 12:31 |
dviroel | and getting an outdated model | 12:31 |
sean-k-mooney | is that adressed by https://review.opendev.org/c/openstack/watcher/+/952499/4 | 12:31 |
sean-k-mooney | it shoudl be right? | 12:32 |
dviroel | #link https://review.opendev.org/c/openstack/watcher/+/952257 | 12:32 |
dviroel | is the one that address that | 12:32 |
sean-k-mooney | oh ok | 12:32 |
sean-k-mooney | so when that merges the new senario test shoudl pass | 12:32 |
dviroel | here https://review.opendev.org/c/openstack/watcher/+/952257/9/watcher/decision_engine/service.py | 12:32 |
sean-k-mooney | can you add a depend on to the tempest change to show that | 12:32 |
dviroel | there is already | 12:32 |
dviroel | there is also one DNM patch that shows the failure too | 12:33 |
sean-k-mooney | not tha ti can see https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954264 | 12:33 |
dviroel | #link https://review.opendev.org/c/openstack/watcher/+/954364 | 12:33 |
sean-k-mooney | oh you have the depend on in the wrong direction | 12:33 |
dviroel | reproduces the issue | 12:33 |
sean-k-mooney | it need to be form watcher-tempest-plug -> watcher in this case | 12:34 |
sean-k-mooney | well | 12:34 |
dviroel | sean-k-mooney: yes and no, because the tempest change is passing too, in other jobs | 12:34 |
sean-k-mooney | i guess we could merge the tempest test first assuming it passes in eventlet mode | 12:34 |
dviroel | correct, there are other jobs that will run that test too | 12:35 |
sean-k-mooney | ok i assume the last two failures of the promethius job are also the real data tests? | 12:35 |
dviroel | #link https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1f5/openstack/1f55b6937c9b47a9afb510b960ef12ea/testr_results.html | 12:35 |
dviroel | passing on watcher-tempest-strategies with eventlet | 12:35 |
sean-k-mooney | actully since we are talking about jobs the tempest repo does have watcher-tempest-functional-2024-1 | 12:36 |
dviroel | chandan added a comment about the failures, yes | 12:36 |
sean-k-mooney | i.e. jobs for the stable brances but we should add version of watcher-prometheus-integration for 2025.1 | 12:36 |
dviroel | ++ | 12:37 |
sean-k-mooney | to make sure we do not break epoxy with promethus as we extend the test suite | 12:37 |
sean-k-mooney | ok we can do that seperately, i think we can move on. ill take a look at that test today/tomorrow and we can likely proceed with it | 12:38 |
dviroel | sure | 12:38 |
dviroel | I think that I covered everything, any other question? | 12:38 |
sean-k-mooney | just a meta one | 12:38 |
sean-k-mooney | it looks like the descion engin will be done this cycle | 12:39 |
sean-k-mooney | how do you feel about the applier | 12:39 |
sean-k-mooney | also what is the status of the api. are we eventlet free there? | 12:39 |
dviroel | ack, I will still look how decision engine will perf, about resource usage and default number of workers, but it is almost done with these changes | 12:40 |
sean-k-mooney | well for this cycle it wont be the defautl so we can tweak thos as we gain experince with it | 12:40 |
dviroel | ++ | 12:40 |
sean-k-mooney | i woudl start small ie 4 workers max | 12:40 |
sean-k-mooney | *thread in the pools not workers | 12:41 |
dviroel | ack, so one more change for dec-eng is expected for this | 12:41 |
dviroel | yes, in the code is called workers, but yes, nb of threads in the pool | 12:41 |
sean-k-mooney | well we have 2 concepts | 12:42 |
dviroel | sean-k-mooney: i plan to work in the applier within this cycle, but not sure if we are going to have it working until the end of the cycle | 12:42 |
sean-k-mooney | workers in oslo means the number of processes normally | 12:42 |
sean-k-mooney | oh i see... CONF.watcher_decision_engine.max_general_workers | 12:43 |
sean-k-mooney | so watcher is using workers for eventlet already | 12:43 |
sean-k-mooney | ok | 12:43 |
sean-k-mooney | so in nova we are intentioally adding new config opitons | 12:44 |
sean-k-mooney | because the default likely wont be the same but ill look at what watcher has today and comment in the review | 12:44 |
dviroel | ack, the background scheduler is one that has no config for instance | 12:44 |
sean-k-mooney | ok its 4 already https://docs.openstack.org/watcher/latest/configuration/watcher.html#watcher_decision_engine.max_general_workers | 12:45 |
dviroel | and this one ^ - is for the decision engine threadpool | 12:45 |
sean-k-mooney | ya i think its fine normlly the eventlet pool size in most servers is set to around 10000 | 12:45 |
sean-k-mooney | which woudl obvirly be a problem | 12:45 |
sean-k-mooney | but 4 is fine | 12:45 |
dviroel | decision engine threadpool today covers the model synchronize threads | 12:46 |
sean-k-mooney | ack | 12:46 |
dviroel | ok, I think that we can move on | 12:46 |
dviroel | and continue in gerrit | 12:46 |
rlandy | thanks dviroel | 12:46 |
dviroel | tks sean-k-mooney | 12:46 |
rlandy | there were no other reviews added on list | 12:47 |
rlandy | anyone want to raise any other patches needing review now? | 12:47 |
rlandy | k - moving on ... | 12:48 |
sean-k-mooney | i have a topic for the end of the meeting but its not strictly related to a patch | 12:48 |
rlandy | oops | 12:48 |
sean-k-mooney | we can move on | 12:48 |
rlandy | ok - well - bug triage and then all yours | 12:48 |
rlandy | #topic: Bug Triage | 12:48 |
rlandy | Looking at the status of the watcher related bugs: | 12:49 |
rlandy | #link: https://bugs.launchpad.net/watcher/+bugs | 12:49 |
rlandy | has 33 bugs listed | 12:49 |
rlandy | 7 of which are in progress | 12:49 |
rlandy | and 2 incomplete ... | 12:50 |
rlandy | https://bugs.launchpad.net/watcher/+bugs?orderby=status&start=0 | 12:50 |
rlandy | #link https://bugs.launchpad.net/watcher/+bug/1837400 | 12:50 |
rlandy | ^^ only that one is marked "new" | 12:50 |
rlandy | dashboard, client and tempest are all under control with 2 or 3 bugs either in progress or doc related | 12:51 |
sean-k-mooney | the bug seam valid if it still happens | 12:51 |
sean-k-mooney | however i agree that its low priorioty | 12:51 |
sean-k-mooney | we marked it as need-re-triage | 12:51 |
rlandy | so raising only this one today: | 12:51 |
sean-k-mooney | becuase i think we wanted to see if thtis was fixed | 12:51 |
rlandy | https://bugs.launchpad.net/watcher/+bug/1877956 (bug about canceling action plans) | 12:52 |
rlandy | as work was done to fix canceling action plans and greg and I tested it yesterday (admitted from the UI) and that is now working | 12:53 |
dviroel | we found evidences in the code, but I didn't tried to reproduce | 12:53 |
sean-k-mooney | so this was just a looing bug i think | 12:53 |
sean-k-mooney | when i loged at it before i think this is still a problem | 12:53 |
dviroel | should be a real one | 12:53 |
dviroel | and also easy to fix | 12:54 |
sean-k-mooney | yep | 12:54 |
sean-k-mooney | do we have tempest test for cancelaion yet | 12:54 |
sean-k-mooney | i dont think so right | 12:54 |
sean-k-mooney | i thik we can do this by using the sleep action and maybe the actoator stragy | 12:55 |
rlandy | not as far as I know | 12:55 |
dviroel | yeah, a good opportunity to add one too | 12:55 |
sean-k-mooney | i think we should keep this open and just fix the issue when we have time | 12:55 |
sean-k-mooney | ill set it to low? | 12:55 |
dviroel | ++ | 12:55 |
sean-k-mooney | cool | 12:56 |
rlandy | ok - that's it for triage ... | 12:56 |
rlandy | sean-k-mooney: your topic? | 12:56 |
sean-k-mooney | ya... | 12:56 |
sean-k-mooney | so how has heard of the service-type-athority repo? | 12:56 |
amoralej | i haven't | 12:57 |
sean-k-mooney | for wider context https://specs.openstack.org/openstack/service-types-authority/ its a thing that was created a very long time ago and is not documented as part of the project creation process | 12:57 |
jgilaber | me neither | 12:57 |
sean-k-mooney | i disocverd or rediscoverd it tuesday night/yesterday | 12:58 |
sean-k-mooney | Aetos is not listed there and "promethus" does nto follow the requrie naming convetions | 12:58 |
sean-k-mooney | so the keyston endpoint they want to use, sepcificly the service-type | 12:58 |
sean-k-mooney | is not valid | 12:59 |
sean-k-mooney | so they are going to have to create a servifce type "tenant-metrics" is my suggetion | 12:59 |
sean-k-mooney | then we need ot update the spec | 12:59 |
sean-k-mooney | and use that | 12:59 |
sean-k-mooney | but we need to get the tc to approve thatn and we need to tell the telemetry team about this requirement | 13:00 |
sean-k-mooney | i spend a while on the tc channel trying to understand thsi yesterday | 13:00 |
sean-k-mooney | so ya we need to let juan and jaromir know | 13:01 |
amoralej | did the telemetry team start using the wrong names somewhere? | 13:01 |
sean-k-mooney | they planned to start using promethus | 13:01 |
sean-k-mooney | for Aetos | 13:01 |
amoralej | at least no need to revert any code, i hope :) | 13:01 |
sean-k-mooney | not yet | 13:02 |
sean-k-mooney | but watcher will need to know the name to do the check for the endpoint | 13:02 |
sean-k-mooney | and the installer will need ot use the correct name too | 13:02 |
sean-k-mooney | the ohter thing i found out | 13:02 |
sean-k-mooney | is we are using the legacy name for watcher downstream i think | 13:02 |
sean-k-mooney | https://opendev.org/openstack/service-types-authority/src/branch/master/service-types.yaml#L31-L34 | 13:03 |
sean-k-mooney | its offical service-type shoudl be resource-optimization not infra-optim | 13:03 |
dviroel | oh, good to know | 13:03 |
sean-k-mooney | so that a donwstream bug that we shoudl fix in the operator | 13:03 |
sean-k-mooney | both are technically valid but it woudl be better to use the non alias version | 13:04 |
sean-k-mooney | so jaromir i belvie is on pto for the next week or two | 13:04 |
sean-k-mooney | so we need to sync with the telemetry folks and wiehte rwe or they can update the service-types-athurity file with the right content | 13:05 |
sean-k-mooney | anyay way that all i had on this | 13:05 |
dviroel | tks for finding and pursuing this issue sean-k-mooney | 13:06 |
rlandy | thanks for raising this - a lot of PTOs atm ... mtunge is also out from nect week so maybe we try juan if possible | 13:06 |
rlandy | we are over time so I'll move on to ... | 13:07 |
sean-k-mooney | it was mainly by acident i skim the tc meeting notes and the repo came up this week | 13:07 |
sean-k-mooney | or last | 13:07 |
sean-k-mooney | ya we can wrap up and move on | 13:07 |
rlandy | Volunteers to chair next meeting: | 13:08 |
opendevreview | Merged openstack/watcher master: Merge decision engine services into a single one https://review.opendev.org/c/openstack/watcher/+/952499 | 13:09 |
dviroel | o/ | 13:09 |
dviroel | I can chair | 13:09 |
rlandy | thank you dviroel | 13:09 |
rlandy | much appreciated | 13:09 |
rlandy | k folks ... closing out | 13:09 |
rlandy | thank you for attending | 13:09 |
rlandy | #endmeeting | 13:09 |
opendevmeet | Meeting ended Thu Jul 17 13:09:43 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 13:09 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/watcher_irc_meeting___july_17__2025/2025/watcher_irc_meeting___july_17__2025.2025-07-17-12.00.html | 13:09 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/watcher_irc_meeting___july_17__2025/2025/watcher_irc_meeting___july_17__2025.2025-07-17-12.00.txt | 13:09 |
opendevmeet | Log: https://meetings.opendev.org/meetings/watcher_irc_meeting___july_17__2025/2025/watcher_irc_meeting___july_17__2025.2025-07-17-12.00.log.html | 13:09 |
opendevreview | Joan Gilabert proposed openstack/watcher-tempest-plugin master: Add test for volume retype with zone migration https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954625 | 14:20 |
opendevreview | chandan kumar proposed openstack/watcher-tempest-plugin master: Mark workload_balance_strategy_cpu|ram as unstable tests https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955302 | 15:12 |
chandankumar | sean-k-mooney: dviroel ^^ skipped the tests in watcher tempest plugin using unstable_tests. thank you ! | 15:13 |
sean-k-mooney | i guess we can use unstable_test | 15:22 |
sean-k-mooney | that passes if the test passes and skips if it failed | 15:22 |
sean-k-mooney | right? | 15:22 |
chandankumar | yes, correct, skip it if it fails | 15:22 |
opendevreview | chandan kumar proposed openstack/watcher-tempest-plugin master: Mark workload_balance_strategy_cpu|ram as unstable tests https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955302 | 15:23 |
sean-k-mooney | i might actully bring this up in the qa/nova channel becuse there are a number of volume test that probly shoudl have this applied | 15:23 |
sean-k-mooney | but ok im happy with this | 15:23 |
chandankumar | sure, Once I have the results, I will share with gmaan also for review. | 15:24 |
sean-k-mooney | well i didnt mean for this test | 15:24 |
sean-k-mooney | for 3-4 years now there are a set of 2-3 volume test in tempest that frequently fail | 15:25 |
chandankumar | ah ok | 15:25 |
sean-k-mooney | we might want to just acknowlage that and mark them as such | 15:25 |
gmaan | chandankumar: sean-k-mooney honestly saying, unstable_test decorator is really not better choice and we many times forgot to check those tests stability because there is no way to do that. I mean no UI etc | 15:25 |
sean-k-mooney | due to some interaction with qemu some of the volume test tend to trigger kernel panics in the gust | 15:25 |
gmaan | instead of unstable_test I will suggesto to skip test and unskip when it is fixed | 15:26 |
sean-k-mooney | well i suggested the skip_because decorator instead | 15:26 |
sean-k-mooney | chandankumar choose to use the unstable_test one | 15:26 |
sean-k-mooney | we know what the issue is its just going to take a few days to get it fixed | 15:26 |
gmaan | ++ that is explicitly and we know there is one test we need to fix as it is skipped | 15:26 |
sean-k-mooney | if you think its better ot just use skipped_because then chandankumar can you update to that? | 15:27 |
gmaan | unstable_test is really for new tests and we do not know how it will behave in all diff config jobs and mainly reace condition | 15:27 |
gmaan | race | 15:27 |
chandankumar | yes sure | 15:27 |
gmaan | thanks | 15:27 |
sean-k-mooney | gmaan: well you know the volume attach/detach one im thinking of for cinder right | 15:28 |
gmaan | I will update unstable_test doc in Tempest to clear that, that is something we are missing there | 15:28 |
sean-k-mooney | gmaan: we talked about jsut not running them in nova in the past becasue they were so unstable | 15:28 |
sean-k-mooney | that has decreasesd a lot after i added zswap and some other performance tuneing | 15:28 |
sean-k-mooney | gmaan: what i would prefer to do but never had time | 15:28 |
gmaan | sean-k-mooney: yeah, as you know, we fixed/make better by waiting for VM ssh-able in advance before attach/detach but that does not solve everything | 15:29 |
sean-k-mooney | is add a retry_on_panic or retry_on_<somehting> decorator to those | 15:29 |
gmaan | yeah zswap ++ | 15:29 |
gmaan | I see. that will be good | 15:30 |
sean-k-mooney | the kernel panic case i think we could eventually sovle with using alpine or a diffent image to replace cirros | 15:30 |
sean-k-mooney | but since we can detect that in the console logs | 15:30 |
sean-k-mooney | we coudl have tempest retry the test once | 15:30 |
sean-k-mooney | if the vm does panic and avoid a full recheck | 15:30 |
opendevreview | chandan kumar proposed openstack/watcher-tempest-plugin master: Skip workload_balance_strategy_cpu|ram due to known bug https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955302 | 15:31 |
gmaan | I am really on to try new image like alpine ++ | 15:31 |
sean-k-mooney | althogh that is more a stop the bleedign approch then an actual fix like using ubuntu or alpine would be | 15:31 |
gmaan | ' avoid a full recheck' ? | 15:31 |
sean-k-mooney | i mean right now when we have kernel panics that cause a signle test failure we do "recheck kernel panic" and all jobs have to run again | 15:32 |
sean-k-mooney | vs a decorator on the test we know that can pannic tha tjust retry that one test once before reporting fail or success | 15:32 |
gmaan | ohk, ++ to that approach | 15:32 |
sean-k-mooney | it partly just time to do it but maybe i can ask ai to do it... | 15:34 |
sean-k-mooney | that the new standard solution to all tech problems right. we finally foudn a replacement for just turn it off and on again | 15:35 |
dviroel | chandankumar: ack, i will vote and w+1 after getting zuul report. +1 on skip vs unstable, makes sense | 15:42 |
opendevreview | Douglas Viroel proposed openstack/watcher-tempest-plugin master: Skip workload_balance_strategy_cpu|ram due to known bug https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955302 | 16:10 |
opendevreview | Alfredo Moralejo proposed openstack/watcher master: Add status_message fields to audit, action and actionplan https://review.opendev.org/c/openstack/watcher/+/954745 | 16:14 |
opendevreview | Alfredo Moralejo proposed openstack/watcher master: Skip actions automatically based on pre_condition results https://review.opendev.org/c/openstack/watcher/+/954746 | 16:14 |
opendevreview | Joan Gilabert proposed openstack/watcher-tempest-plugin master: Add test for volume retype with zone migration https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954625 | 16:20 |
opendevreview | Joan Gilabert proposed openstack/watcher-tempest-plugin master: Add test for volume retype with zone migration https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954625 | 18:06 |
opendevreview | Douglas Viroel proposed openstack/watcher-tempest-plugin master: Skip execute_workload_balance_strategy_cpu|ram due to known bug https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955302 | 19:14 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!