@chgans:matrix.org | ^^^ Looks like it's testing time for me! :) Will try to find time tomorrow. Thanks corvus for the fix! | 10:13 |
---|---|---|
@westphahl:matrix.org | Clark corvus : could you look again at https://review.opendev.org/c/zuul/zuul/+/848234 ? | 12:27 |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 848132: zuul-web: return errors as JSON https://review.opendev.org/c/zuul/zuul/+/848132 | 14:24 | |
@fungicide:matrix.org | is anyone aware of nondeterminism in the circular deps tests? i think https://zuul.opendev.org/t/zuul/build/3cb2b13fbbd94e4aa9b2aa52e71e5c1d failed test_job_deduplication_failed_job because one of the builds it expected to succeed was aborted, but it's not clear to me why that happened. did it race the failure of a build for the change ahead of it? looks like that caused the running build to be cancelled | 15:27 |
-@gerrit:opendev.org- Jonathan Rosser proposed: [zuul/zuul-jobs] 848880: [DNM] test 848027 https://review.opendev.org/c/zuul/zuul-jobs/+/848880 | 15:38 | |
@clarkb:matrix.org | > <@westphahl:matrix.org> Clark corvus : could you look again at https://review.opendev.org/c/zuul/zuul/+/848234 ? | 15:49 |
I think my last major questions is the set(None) case as pointed out by my comment there. | ||
-@gerrit:opendev.org- Jonathan Rosser proposed wip: [zuul/zuul-jobs] 848880: [DNM] test 848027 https://review.opendev.org/c/zuul/zuul-jobs/+/848880 | 15:49 | |
@clarkb:matrix.org | > <@westphahl:matrix.org> Clark corvus : could you look again at https://review.opendev.org/c/zuul/zuul/+/848234 ? | 15:49 |
* I think my last major question is the set(None) case as pointed out by my comment there. | ||
@westphahl:matrix.org | Clark: the second call to `getProjectBranches()` should never return None. This case will lead to an early return after the first call already | 15:51 |
@westphahl:matrix.org | sorry, that probably wasn't clear in my response to your comment | 15:51 |
@clarkb:matrix.org | I see so we only have to handle that situation on the first pass whcih was done | 15:52 |
@westphahl:matrix.org | yes, unless I'm missing something (which might be the case) | 15:54 |
@jim:acmegating.com | fungi: that's new to me, but i suppose some new nondetermistic tests are likely to pop up with the new py versions. it's not immediately apparent to me what the issue is. | 16:06 |
@clarkb:matrix.org | swest: I left a super minor nit but +2'd. I think you can probably self approve and decide if you want to followup or not for that minor item. (Just not approving so that you have a chance to see the note) | 16:07 |
@westphahl:matrix.org | Clark: replied to your comment | 16:10 |
@clarkb:matrix.org | Oh right None being overloaded. I think you are correct | 16:11 |
@clarkb:matrix.org | Approved thanks | 16:12 |
@jim:acmegating.com | Clark: while you're at it, can you look at the followup test for that? https://review.opendev.org/848524 | 16:13 |
@clarkb:matrix.org | will do | 16:15 |
-@gerrit:opendev.org- Jonathan Rosser proposed wip: [zuul/zuul-jobs] 848880: [DNM] test 848027 and 848883 https://review.opendev.org/c/zuul/zuul-jobs/+/848880 | 16:40 | |
@fungicide:matrix.org | https://zuul.opendev.org/t/zuul/build/b08bfd5c47a14cbeafc1d85966d294bc is another identical failure in test_job_deduplication_failed_job | 17:31 |
@jim:acmegating.com | guess it's time to dig into it :) | 17:32 |
@fungicide:matrix.org | starting to wonder if it's become suddenly persistent | 17:32 |
@fungicide:matrix.org | now i'm curious how we were able to add the py310 job successfully | 17:32 |
@fungicide:matrix.org | looking through the build history, i think it may have a 50% success rate. checking some failures for other changes now to see if they're failing the same way | 17:34 |
@fungicide:matrix.org | yeah, https://zuul.opendev.org/t/zuul/build/a8ef2a83bc614a19895b7e3e2ac61eec is for a different change and also failed only test_job_deduplication_failed_job though the synthetic build outcomes were slightly different (two aborts instead of one) | 17:37 |
@fungicide:matrix.org | is there a way to enforce build completion order in a test? | 17:37 |
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 848234: Fix read-only branches error in zuul-web https://review.opendev.org/c/zuul/zuul/+/848234 | 17:38 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 848524: Add branch cache fetch failure tests https://review.opendev.org/c/zuul/zuul/+/848524 | 17:38 | |
@fungicide:matrix.org | the commonality seems to be that at least one build is getting aborted (sometimes more) due to failure of the change ahead | 17:38 |
@fungicide:matrix.org | which i think is the correct behavior if the first change has a failure result for a build before all the builds for the second change complete | 17:39 |
@fungicide:matrix.org | but since this also involves circular dependencies and deduplication of an extra result, i'm not entirely sure | 17:40 |
@fungicide:matrix.org | is the deduplication behavior itself deterministic as to which result it "keeps" and which it "deduplicates"? | 17:41 |
@jim:acmegating.com | we can enforce build order by holding jobs in build and then releasing them selectively; but yeah, i don't know if that's what we need to do here or if it's something else we need to accomodate in the test. | 17:46 |
@jim:acmegating.com | the build it uses should just be the for the first item in the pipeline | 17:47 |
@fungicide:matrix.org | so in that case we shouldn't see an abort bleeding through from the same job in second change? | 17:48 |
@fungicide:matrix.org | i'll read back through the log again, since i'm probably misinterpreting what it's trying to tell me | 17:49 |
@fungicide:matrix.org | ah, okay, common-job is the one common to both changes and is getting deduplicated with a failure result consistently across all the examples i've looked at | 17:52 |
@fungicide:matrix.org | but project1-job and project2-job seem to randomly report either success or aborted | 17:53 |
@fungicide:matrix.org | also the build history is being returned in a seemingly random order, while the test assumes a fixed order | 17:58 |
@fungicide:matrix.org | oh, never mind that second bit. the assert does specify ordered=False so i guess that's irrelevant | 17:58 |
@jim:acmegating.com | oh... i think i may be starting to understand | 18:05 |
@jim:acmegating.com | i think the race is between whether the project*-job jobs finish before common-job fails. if they finish, then we get the success result. if they aren't done yet, then the fact that change #2 is complete and part of the bundle but failed means that zuul decides to abort jobs on change #1 | 18:07 |
@jim:acmegating.com | in other words, we either need to make sure the project jobs finish before the common job fails, or we need to not care whether the result is aborted or not | 18:07 |
@jim:acmegating.com | fungi: i have a fix; i'll push it up in a second | 18:14 |
@fungicide:matrix.org | oh, yes that makes sense, okay | 18:15 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 848887: Fix race in test_job_deduplication_failed_job https://review.opendev.org/c/zuul/zuul/+/848887 | 18:24 | |
@jim:acmegating.com | fungi: ^ | 18:24 |
@clarkb:matrix.org | corvus: I haven't looked at that too closely but shouldn't zuul handle that without strict ordering in the test so that rael world usage functions? | 18:29 |
@jim:acmegating.com | Clark: i think real world usage is we don't care | 18:30 |
@clarkb:matrix.org | I guess in the real world an abort is maybe fine when a job fails like that but we're trying to assert stable behavior so need it to be deterministic? | 18:30 |
@jim:acmegating.com | yep exactly that | 18:30 |
@clarkb:matrix.org | ya ok that makes sense. I left a note about something else inline too | 18:31 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 848887: Fix race in test_job_deduplication_failed_job https://review.opendev.org/c/zuul/zuul/+/848887 | 18:32 | |
@jim:acmegating.com | thanks fixed :) | 18:32 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 848890: Handle non default loopvars in Ansible callback stream plugin https://review.opendev.org/c/zuul/zuul/+/848890 | 18:51 | |
@clarkb:matrix.org | So we don't forget ^ but that lacks a test currently | 18:51 |
@clarkb:matrix.org | I'm not sure I'll get to adding a test today as I feel like I have a million other things to do first | 18:51 |
@clarkb:matrix.org | At the very least that should check we don't regress? | 18:51 |
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] 848014: Report timing info for POST_FAILURE and TIMED_OUT builds https://review.opendev.org/c/zuul/zuul/+/848014 | 18:53 | |
@jim:acmegating.com | Clark: on my workstation, zuul unit tests are 2x faster under py310 than py39 | 20:32 |
@jim:acmegating.com | * Clark: on my workstation, zuul unit tests take 50% of the time under py310 compared to py39 | 20:33 |
@jim:acmegating.com | py39: | 20:33 |
Ran: 1325 tests in 747.7551 sec. | ||
Sum of execute time for each test: 14749.2265 sec. | ||
py310: | ||
Ran: 1328 tests in 384.7824 sec. | ||
Sum of execute time for each test: 7170.3903 sec. | ||
@jim:acmegating.com | i tend to get very consistent times on my test runs too, typically within 10 seconds variation in the wall-clock time. | 20:34 |
@clarkb:matrix.org | nice. And python3.11 is expected to be 10-60% quicker too | 20:37 |
@clarkb:matrix.org | quicker than 3.10 I mean | 20:37 |
@jim:acmegating.com | how soon can we upgrade? ;) | 20:38 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 848890: Handle non default loopvars in Ansible callback stream plugin https://review.opendev.org/c/zuul/zuul/+/848890 | 20:38 | |
@clarkb:matrix.org | 3.11 is expected to release early october. I suspect we will be able to update not long after that? Thats my hope anyway. Then also maybe taking some time to refactor bits of zuul to be more consistent in their shape so that python3.11 can do its best to run it quickly | 20:39 |
@clarkb:matrix.org | I added a test in 848890 but haven't run it locally yet | 20:39 |
@clarkb:matrix.org | but I think I got the general shape correct | 20:40 |
@clarkb:matrix.org | we might want to add more cases to the testing to handle the other bits that I updated rather than just the instance we noticed with the random selection | 20:40 |
@jim:acmegating.com | opendev's cpu use is not high, so if there's going to be a difference, we might have to squint to see it. so far the difference on zuul01 from july 5 -> july 6 looks like it's within the error bars (eg, about the same as jun 30 -> july 1) | 20:41 |
@jim:acmegating.com | i'm not sure how else we could quantify it. maybe aggregate loop runtime, but network ops are the bulk of that time so again it may be lost in the noise. | 20:41 |
@clarkb:matrix.org | corvus: ya and some things that zuul does a lot of like json (de)serialization is on the lwoer end of the speedups. But they do show speedups just not as impressively | 20:42 |
@jim:acmegating.com | ++ | 20:42 |
@clarkb:matrix.org | regex and json ops according ot benchmarks do improve but not as crazily as some of the more mathy benchmarks | 20:42 |
@clarkb:matrix.org | I guess regexes are very mathy, but you probably know what I mean | 20:42 |
@jim:acmegating.com | https://grafana.opendev.org/d/2e89fb78e5/zuul-performance-metrics?orgId=1&from=1656535509194&to=1657140309194&var-tenant=openstack&viewPanel=17 | 20:45 |
@jim:acmegating.com | ooh there's a discontinuity | 20:45 |
@jim:acmegating.com | like maybe a 20% decrease in layout generation time for the openstack tenant | 20:45 |
@clarkb:matrix.org | I think for 3.10 one of the changes they made was to thread scheduling | 20:48 |
@clarkb:matrix.org | which may explain why testing is so much quicker? | 20:49 |
@jim:acmegating.com | could be -- though testing only has a couple more threads than production (since each test runs in its own process) | 20:50 |
@clarkb:matrix.org | https://bugs.python.org/issue42093 is apparently a big python3.10 performance change | 20:57 |
@clarkb:matrix.org | Upstream definitely seems interested in performance these days which is great. Hopefully 3.12 can keep the trned going :) | 20:57 |
@fungicide:matrix.org | yeah, there's been a slew of peps for speedups in various parts of the interpreter. from what i can see a lot of that is being driven by enterprise users who chose python and then decided things were running too slowly so have been looking for anywhere to improve that safely | 21:39 |
@fungicide:matrix.org | e.g. the cinder (no not that cinder) work meta has started to open up | 21:40 |
@jim:acmegating.com | > <@fungicide:matrix.org> yeah, there's been a slew of peps for speedups in various parts of the interpreter. from what i can see a lot of that is being driven by enterprise users who chose python and then decided things were running too slowly so have been looking for anywhere to improve that safely | 21:50 |
i think every python dev can relate to that :) | ||
@clarkb:matrix.org | wow 848890 passes testing. I'll let reviewers chime in on whether or not they think we need to cover more of the cases of non default loop vars | 22:52 |
@jim:acmegating.com | Clark: that lgtm but i left a nit i suspect you may want to fix :) | 22:55 |
@jim:acmegating.com | Clark: and fwiw, i think we should include the fallback so that we don't make any more assumptions than necessary about ansible input | 22:56 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 848910: Apply timer trigger jitter to project-branches https://review.opendev.org/c/zuul/zuul/+/848910 | 22:57 | |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 848890: Handle non default loopvars in Ansible callback stream plugin https://review.opendev.org/c/zuul/zuul/+/848890 | 22:57 | |
@clarkb:matrix.org | corvus: should I remove the todo then? | 22:57 |
@clarkb:matrix.org | I can update it to make a note about not assuming things about ansible apis | 22:58 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 848890: Handle non default loopvars in Ansible callback stream plugin https://review.opendev.org/c/zuul/zuul/+/848890 | 22:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!