opendevreview | melanie witt proposed openstack/tempest master: Add tests for creating servers with (anti-)affinity https://review.opendev.org/c/openstack/tempest/+/924691 | 02:35 |
---|---|---|
opendevreview | melanie witt proposed openstack/tempest master: Add tests for creating servers with (anti-)affinity https://review.opendev.org/c/openstack/tempest/+/924691 | 04:15 |
opendevreview | yatin proposed openstack/devstack stable/2023.1: [stable only] Unset the default value of MYSQL_GATHER_PERFORMANCE https://review.opendev.org/c/openstack/devstack/+/924740 | 05:04 |
frickler | weird, I've been seeing a similar failure in kolla-swift jobs, but didn't investigate yet. the swift commands should be part of the basic osc, no plugin needed, so my assumption is that the catalog would be missing the swift endpoint | 05:29 |
opendevreview | yatin proposed openstack/devstack master: [DNM] check abandoned patch https://review.opendev.org/c/openstack/devstack/+/924822 | 06:56 |
opendevreview | Martin Kopec proposed openstack/stackviz master: Run the node job with the current node version https://review.opendev.org/c/openstack/stackviz/+/923751 | 07:12 |
opendevreview | Martin Kopec proposed openstack/stackviz master: Run the node job with the current node version https://review.opendev.org/c/openstack/stackviz/+/923751 | 08:58 |
opendevreview | Amit Uniyal proposed openstack/whitebox-tempest-plugin master: verify tls in guest VM https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/923824 | 10:16 |
opendevreview | Amit Uniyal proposed openstack/whitebox-tempest-plugin master: verify tls in guest VM https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/923824 | 10:27 |
opendevreview | Amit Uniyal proposed openstack/whitebox-tempest-plugin master: verify vencrypt feature https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/923824 | 11:17 |
ykarel | dansmith, kopecmartin can you +W again https://review.opendev.org/c/openstack/devstack/+/924740 , depends-on merged | 12:44 |
ykarel | thx in advance | 12:44 |
dansmith | ykarel: done | 13:32 |
clarkb | gmann: kopecmartin: during the recent mass security fixes stuff frickler pointed out that jobs are never stable enough to land 20 changes at once. The way openstack's gate queue is configured we will start jobs for a minimum of 20 changes then that number can grow if things are stable and merging. A problem with that is gate resets with at least 20 changes in the queue mean a | 14:58 |
clarkb | lot of wasted resources and slowness getting resources where they are most valuable. frickler's idea was the reduce the minimum queue size from 20 changes to say 10 changes to reduce that impact. The suggestion from corvus is to instead make use of early failure detection on zuul jobs so that zuul can more quickly reallocate resources rather than waiting for long tempest jobs | 14:58 |
clarkb | (or other long jobs) to complete before rearranging the queue | 14:58 |
clarkb | I think the greatest impact to start is likely going to be with devstack/tempest jobs simply because they run for quite a long time and have a reasonably strong inheritance tree that should allow us to make a more central update to the zuul config and impact a number of jobs | 14:59 |
clarkb | Is there any interest from the qa team to implement that on the tempest and maybe devstack jobs? The way it works is you define a regex on the job that scans the job output as it is written to detect strings that indicate failures. The zuul project has been dogfooding the functionality and I can point at jobs there as examples | 15:00 |
clarkb | the potential risk is that if our rules are not strict enough we could identify successful jobs as failures in appropriately so having those somewhat familiar with the jobs involved in the process is important | 15:00 |
opendevreview | Merged openstack/devstack stable/2023.1: [stable only] Unset the default value of MYSQL_GATHER_PERFORMANCE https://review.opendev.org/c/openstack/devstack/+/924740 | 15:10 |
opendevreview | Riccardo Pittau proposed openstack/devstack master: Pin openstack cli command in swift tempurls function https://review.opendev.org/c/openstack/devstack/+/924867 | 16:03 |
opendevreview | James Parker proposed openstack/tempest master: Add configurable hostname pattern to filter hosts https://review.opendev.org/c/openstack/tempest/+/924785 | 19:59 |
clarkb | https://opendev.org/zuul/zuul/src/branch/master/.zuul.yaml#L139-L142 is the way zuul is using it for unittest early failure detection. The docs for that are at https://zuul-ci.org/docs/zuul/latest/config/job.html#attr-job.failure-output | 20:14 |
gmann | clarkb: I am not sure I understood it completely. '..will start jobs for a minimum of 20 changes ' you mean if there are 19 changes in gate then it would not start the jobs to run on any of the changes? | 21:55 |
gmann | and when 20 changes are in gate then failing one job will cause reset to all of the changes? | 21:56 |
clarkb | gmann: no if there are 40 changes in the gate only 20 of them will run jobs | 22:33 |
clarkb | gmann: the suggestions is to reduce that number to 10 | 22:33 |
clarkb | the underlying motiviation is that if you have fewer jobs running then a gate reset is less painful. Another approach to addressing that is to reset more quickly so that you aren't consuming the resources unnecessarily long when you know you will discard them | 22:34 |
clarkb | then as you merge things that window size increases. As things fail and reset the window size decreases. It is very similar to tcp slowstart. The idea is to moderate resource consumption so that large queues and consistent failures are less painful | 22:40 |
gmann | clarkb: ohk, thanks I got it now. with early failure my concern is 1. we would not be able to know the all failing tests 2. would not know the result of the test that change is impacting and we might need to re-run it if change lead to the valid test failure | 22:49 |
clarkb | gmann: to clarify early failure detection does not stop the job from running. It merely allows zuul to act on the resulting restart quicker | 22:49 |
clarkb | the jobs will still run to completion and report normally | 22:49 |
corvus | (ie, zuul will know the job is going to fail before it actually fails) | 22:50 |
clarkb | but that means all the jobs running behind it that will be restarted either at the early detection point or at job completion as they do today can be reset onto the new speculative git head and started again | 22:51 |
gmann | clarkb: ohk, so no impact from job running perspective it is just zuul detect and act in advance. | 22:51 |
clarkb | that means all the resources for those jobs behind are released and reused more quickly | 22:51 |
clarkb | gmann: yes pretty much. The one gotcha to that is if the early detection rule is bad then we could possibly inadverdently detect "failures" on successful jobs. THis is mostly why I want to get the qa team involved in this to ensure we've got rules that are valid for tempest and won't do that | 22:52 |
gmann | you mean other jobs of that change will be reset and gat stop? | 22:52 |
clarkb | no, its still only jobs for change sbehind in the queue that are stopped and then moved to a new head just as they would be during normal failure | 22:52 |
clarkb | the config option works by matching job output strings and if you get a match treats the job as a failure early. If however the job succeeds now you're in a weird state | 22:53 |
clarkb | this is why we don't just have generic rules for this it really needs to be job specific to match properly | 22:53 |
corvus | but if it takes 10 minutes to hit the first test failure, and 1 hour to run the whole job, then it means that zuul can reset the gate queue after 10 minutes instead of waiting an hour | 22:53 |
clarkb | yup and that is where the resource savings come from. We're reducing the amount of thrown away test time | 22:53 |
corvus | (so if you have a deep gate queue and a bunch of changes failing, that's 10m+10m+10m... instead of 1h+1h+1h, so things move a lot faster) | 22:54 |
gmann | clarkb: corvus: I see. | 22:55 |
gmann | I think we can try it in tempest jobs but let me think it more on it. I need to go to pickup my son from school now. I will ping you guys tomorrow if more query. | 22:56 |
clarkb | ok | 22:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!