Monday, 2021-10-25

-@gerrit:opendev.org- Simon Westphahl proposed:10:31
- [zuul/zuul] 810920: Store change queues in Zookeeper https://review.opendev.org/c/zuul/zuul/+/810920
- [zuul/zuul] 811422: Save and restore bundle with item in Zookeeper https://review.opendev.org/c/zuul/zuul/+/811422
- [zuul/zuul] 811955: Pass ZK context to deserialize method of ZKObjects https://review.opendev.org/c/zuul/zuul/+/811955
- [zuul/zuul] 812450: Move ZuulMark from configloader to model https://review.opendev.org/c/zuul/zuul/+/812450
- [zuul/zuul] 812451: Recursively delete all sub-nodes of ZKObjects https://review.opendev.org/c/zuul/zuul/+/812451
- [zuul/zuul] 812466: Only retry ZK operations for Kazoo exceptions https://review.opendev.org/c/zuul/zuul/+/812466
- [zuul/zuul] 812452: Store build sets in Zookeeper https://review.opendev.org/c/zuul/zuul/+/812452
- [zuul/zuul] 812467: Add support for sharded ZKObjects https://review.opendev.org/c/zuul/zuul/+/812467
- [zuul/zuul] 812673: Store RepoFiles for a build set in Zookeeper https://review.opendev.org/c/zuul/zuul/+/812673
- [zuul/zuul] 813805: Remove project pipeline config from queue item https://review.opendev.org/c/zuul/zuul/+/813805
- [zuul/zuul] 813809: Lookup event class names from global symbol table https://review.opendev.org/c/zuul/zuul/+/813809
- [zuul/zuul] 813826: Store and resolve queue item's ahead/behind refs https://review.opendev.org/c/zuul/zuul/+/813826
- [zuul/zuul] 814544: Cleanup stale items after refreshing a pipeline https://review.opendev.org/c/zuul/zuul/+/814544
- [zuul/zuul] 814570: Reference active change queues in pipeline state https://review.opendev.org/c/zuul/zuul/+/814570
- [zuul/zuul] 814571: Update pipeline state when modifying attributes https://review.opendev.org/c/zuul/zuul/+/814571
- [zuul/zuul] 814772: Allow passing extra attributes to ZKObject.fromZK https://review.opendev.org/c/zuul/zuul/+/814772
- [zuul/zuul] 814862: Bail out when a project moves between connections https://review.opendev.org/c/zuul/zuul/+/814862
- [zuul/zuul] 814773: Move re-enqueue to pipeline processing https://review.opendev.org/c/zuul/zuul/+/814773
- [zuul/zuul] 814899: Delete old build sets immediately https://review.opendev.org/c/zuul/zuul/+/814899
- [zuul/zuul] 815111: Store builds in Zookeeper https://review.opendev.org/c/zuul/zuul/+/815111
- [zuul/zuul] 815276: Add change queues to change queue managers https://review.opendev.org/c/zuul/zuul/+/815276
- [zuul/zuul] 815277: Refresh pipelines before checking for empty queues https://review.opendev.org/c/zuul/zuul/+/815277
- [zuul/zuul] 815278: DNM: execute tests with two schedulers https://review.opendev.org/c/zuul/zuul/+/815278
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com:10:31
- [zuul/zuul] 812750: Add LocalZKContext for job freezing https://review.opendev.org/c/zuul/zuul/+/812750
- [zuul/zuul] 812760: Add RepoState object https://review.opendev.org/c/zuul/zuul/+/812760
- [zuul/zuul] 813552: Remove Worker class https://review.opendev.org/c/zuul/zuul/+/813552
- [zuul/zuul] 813895: Move job_graph attribute to BuildSet https://review.opendev.org/c/zuul/zuul/+/813895
- [zuul/zuul] 813913: Serialize JobGraph objects to ZK https://review.opendev.org/c/zuul/zuul/+/813913
- [zuul/zuul] 814065: Serialize ProjectMetadata on JobGraph https://review.opendev.org/c/zuul/zuul/+/814065
- [zuul/zuul] 814071: Add test_freeze_noop_job https://review.opendev.org/c/zuul/zuul/+/814071
- [zuul/zuul] 814069: Remove setBase from job freeze API https://review.opendev.org/c/zuul/zuul/+/814069
- [zuul/zuul] 814070: Create Abstract and FrozenJob classes https://review.opendev.org/c/zuul/zuul/+/814070
- [zuul/zuul] 814242: Make FrozenJob.updateParentData a static method https://review.opendev.org/c/zuul/zuul/+/814242
- [zuul/zuul] 814281: Remove toDict from FrozenJob https://review.opendev.org/c/zuul/zuul/+/814281
- [zuul/zuul] 814243: Make FrozenJob a ZKObject https://review.opendev.org/c/zuul/zuul/+/814243
- [zuul/zuul] 814329: Implement frozen job serialization/deserialization https://review.opendev.org/c/zuul/zuul/+/814329
- [zuul/zuul] 814679: Store FrozenJob data in separate znodes https://review.opendev.org/c/zuul/zuul/+/814679
- [zuul/zuul] 815154: Update test_inventory to be ZK-friendly https://review.opendev.org/c/zuul/zuul/+/815154
@kain99:matrix.org:q13:01
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 815305: Example docker-compose: support podman https://review.opendev.org/c/zuul/zuul/+/81530513:41
@mhuin:matrix.org^ super trivial fix to support short name resolution enforcement in podman, so that people who don't use Docker can still play with the compose13:42
@mhuin:matrix.org(Fedora packs podman by default, for example)13:42
@ashleybullock:matrix.orgHey, has anyone seen the following error when trying to upgrade to zuul 4.0, using a mysql database connection (set up as [database] in the zuul.conf) and the web & scheduler are both erroring with: 13:42
2021-10-25 13:27:02,525 ERROR zuul.WebServer: File "/usr/local/lib/python3.6/site-packages/zuul/driver/sql/sqlconnection.py", line 314, in _setup_models
2021-10-25 13:27:02,525 ERROR zuul.WebServer: Base = orm.declarative_base(metadata=self.metadata)
2021-10-25 13:27:02,525 ERROR zuul.WebServer: AttributeError: module 'sqlalchemy.orm' has no attribute 'declarative_base'
Is anyone using sqlalchemy and seen something similar?
@mhuin:matrix.orgjust a hunch, but do you have the right version of sqlalchemy ? check the requirements.txt file maybe?13:44
@clarkb:matrix.orgYa that was fixed in zuul at some point and the issue is caused by a newer sqlalchemy/alembic pair. Check the git log for declarative_base and it should show up13:45
-@gerrit:opendev.org- Simon Westphahl proposed:13:54
- [zuul/zuul] 815111: Store builds in Zookeeper https://review.opendev.org/c/zuul/zuul/+/815111
- [zuul/zuul] 815276: Add change queues to change queue managers https://review.opendev.org/c/zuul/zuul/+/815276
- [zuul/zuul] 815277: Refresh pipelines before checking for empty queues https://review.opendev.org/c/zuul/zuul/+/815277
- [zuul/zuul] 815309: Cancel jobs before resetting builds https://review.opendev.org/c/zuul/zuul/+/815309
@fungicide:matrix.orgAshley Bullock: it's fixed in zuul 4.9.0 by https://review.opendev.org/804456 so if you're trying to use zuul<4.9 you need sqlalchemy<2 until you upgrade zuul further14:40
@fungicide:matrix.orgAshley Bullock: once you solve that, you may also discover that you need to pin alembic<1.7 until you reach zuul 4.9.014:44
@ashleybullock:matrix.orgThanks both, I'll give that a try now15:18
@clarkb:matrix.orgcorvus: the time results of https://review.opendev.org/c/zuul/zuul/+/815205 look promising. Though the fail rate could be a problem if that is related to the sizing15:46
@jim:acmegating.comClark: agree; at first glance, none of those look like resource-related failures, they may just be latent races15:49
@jim:acmegating.comClark: think i should just swap opt 815077 with that?15:51
@jim:acmegating.comor, really, i guess just drop the 32G part of that change?15:52
@clarkb:matrix.orgYa I think if the smaller nodes show similar improvements via uncapped concurrency then you're better off with access to nodes in all clouds from a redundancy standpoint15:52
@jim:acmegating.comthen separately, i can try to see if i can spot the races in those failures.15:52
@jim:acmegating.comokay, will update that change15:52
@clarkb:matrix.organd you can always go larger later if necessary but harder to go smaller (as things have a tendency to grow to fit)15:52
@jim:acmegating.comtobiash: ^ fyi15:52
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 815077: Uncap concurrency in tests https://review.opendev.org/c/zuul/zuul/+/81507715:54
@spamaps:spamaps.ems.hostIs this page totally broken or is it just me? https://zuul-ci.org/docs/zuul/howtos/cross-project-gating.html#cross-project-gating16:26
@jim:acmegating.comit seems very short16:27
@spamaps:spamaps.ems.hostAnd on the same thread.. anyone have a good link for somebody to learn about Depends-On?16:27
@spamaps:spamaps.ems.hostI can't seem to find one in the docs.16:27
@jim:acmegating.comspamaps: https://zuul-ci.org/docs/zuul/discussion/gating.html16:30
@jim:acmegating.comspamaps: specifically https://zuul-ci.org/docs/zuul/discussion/gating.html#cross-project-dependencies16:30
@jim:acmegating.comspamaps: that's underneath this page, "zuul concepts" which may be useful too (based on your line of questioning): https://zuul-ci.org/docs/zuul/discussion/concepts.html16:32
@fungicide:matrix.orglooks like that extremely brief document was added several years ago by https://review.opendev.org/571420 and never filled out further16:43
@fungicide:matrix.orgspamaps: we also have a glossary if you're looking for brief definitions of terms: https://zuul-ci.org/docs/zuul/reference/glossary.html#term-cross-project-dependency16:46
@clarkb:matrix.orgianw: out of curiousity why do we stick with python-3.7 on https://review.opendev.org/c/zuul/nodepool/+/806312 ? Zuul is running 3.8. I'll push a followup change to see if 3.8 is a viable option (would be nice to keep those two in sync I think)17:25
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/nodepool] 815341: Update docker image to python3.8 https://review.opendev.org/c/zuul/nodepool/+/81534117:28
-@gerrit:opendev.org- Tobias Henkel proposed on behalf of Simon Westphahl: [zuul/zuul] 815278: DNM: execute tests with two schedulers https://review.opendev.org/c/zuul/zuul/+/81527817:30
@jim:acmegating.comtobiash: https://review.opendev.org/815077 look okay?  i dropped the 32GB nodes since it looks like we may be able to get by with 8G (but will probably have some races to fix)17:33
@clarkb:matrix.org> <@clarkb:matrix.org> ianw: out of curiousity why do we stick with python-3.7 on https://review.opendev.org/c/zuul/nodepool/+/806312 ? Zuul is running 3.8. I'll push a followup change to see if 3.8 is a viable option (would be nice to keep those two in sync I think)17:38
Oh there is a followup change to switch to 3.9. I guess the reason for 3.9 is the arm64 wheels are already built for bullseye.
@clarkb:matrix.orgI think the "problem" with 3.9 is none of our other testing is done on that python.17:39
@jim:acmegating.comyeah... our unit test jobs are 3.817:39
@clarkb:matrix.orgcorvus: ^ I know we've been head down on other things but have there been any thoughts or conversations on bumping up to 3.9?17:39
@jim:acmegating.comClark: i know of no reason not to start bringing things up to 3.917:40
@jim:acmegating.comi run zuul unit tests locally under 3.917:40
@jim:acmegating.comi agree with you, we should try to keep everything in sync17:40
@clarkb:matrix.orgok cool, we should be able to bump nodepool and zuul up to 3.9 in that case17:40
@clarkb:matrix.orglet me see about pushing some changes for that17:40
@jim:acmegating.comi think the last blocker for tox-py39 in zuul was gearman, and we pinned?17:41
@clarkb:matrix.orgyes gearman is pinned. Does the pinned version not work with newer python?17:41
@clarkb:matrix.org(it must if you run 3.9 locally)17:41
@jim:acmegating.commy local unit tests run with gear 0.15.117:44
@jim:acmegating.comwhich is the pinned version17:44
@jim:acmegating.commaybe the issue was xenial?17:44
@jim:acmegating.comi'm running hirsute17:45
@jim:acmegating.comClark: https://review.opendev.org/789654 is relevant17:47
@jim:acmegating.comso maybe we still have the gear issue.17:47
@jim:acmegating.comperhaps we could just stick with 3.8 a little bit longer so we can get through the SOS work without throwing that into the mix?17:48
@clarkb:matrix.orgah ok it is related to the TLS stuff, I guess that isn't too surprising17:48
@clarkb:matrix.orgya in that case I'm kind of thinking we update nodepool to 3.8 as well. Then later we can bump everything to 3.917:48
@jim:acmegating.com++17:48
@clarkb:matrix.orgI see a few improvements we can make to the xisting testing though. I'll get some patches up shortly17:49
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Ian Wienand: [zuul/zuul] 789654: Update to Python 3.9 https://review.opendev.org/c/zuul/zuul/+/78965417:51
@jim:acmegating.comClark: ^ i refreshed ianw's change just to see what it does now.  if we can move to 3.9 without a fuss, great.  but if it's going to raise the gear tls  issue i'd rather just focus on finishing up the removal.17:52
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 815343: Couple of CI consistency cleanups https://review.opendev.org/c/zuul/zuul/+/81534317:53
@clarkb:matrix.orgcorvus: ^ I noticed those two things when looking at py3.917:53
@jim:acmegating.comlgtm17:54
@clarkb:matrix.orgI half wonder if py38 on focal has the same gear tls issues17:54
@clarkb:matrix.orgbut that will tell us17:54
@jim:acmegating.comya17:55
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:17:55
- [zuul/zuul] 809293: Add an API for ZK-backed objects https://review.opendev.org/c/zuul/zuul/+/809293
- [zuul/zuul] 810328: Temporarily enqueue cycle changes for reporting https://review.opendev.org/c/zuul/zuul/+/810328
- [zuul/zuul] 811244: Dequeue items after we're done with them https://review.opendev.org/c/zuul/zuul/+/811244
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl:17:55
- [zuul/zuul] 809532: Simplified attribute API for ZKObjects https://review.opendev.org/c/zuul/zuul/+/809532
- [zuul/zuul] 809414: Make QueueItem a Zookeeper object https://review.opendev.org/c/zuul/zuul/+/809414
@jim:acmegating.comClark: ^ i just rebased the first part of the stack on the new test change.17:56
@clarkb:matrix.orgAnd left a comment on the first nodepool change to address the job requires thing. But seems like we've got a path forward here at least.17:59
@clarkb:matrix.orgcorvus: are those safe to start approving as I go through them or is this something we awnt ot try and land in as short a period of time as possible after reviewing most of the stack?18:01
@jim:acmegating.comClark: i think it's "safe" but weird to half-land it.  so i think now that we've got the end of the stack written, it's probably okay to start merging them now.  i think it'll be okay if it takes days to land the whole thing18:03
@jim:acmegating.comi think if we restarted opendev in the middle, it shouldn't hurt.18:04
@jim:acmegating.comjust not much of a reason to, and i would avoid it if possible.18:04
@jim:acmegating.comall that to say, i think i lean just ever so slightly on the side of "lets set +w on them as we go"18:04
@clarkb:matrix.orgok I'll try to keep an eye out for anything that looks unsafe if landed midway through the stack and we restart before the rest of the stack lands18:05
@jim:acmegating.comthat reminds me, i would like to tag what opendev restarted on.  this is one commit after that, which adds a release note:18:06
@jim:acmegating.comcommit 281d2f026b8198d579e39918a8d35cf19178831a (tag: 4.10.4, refs/changes/75/815175/1)18:06
@jim:acmegating.comzuul-maint: ^ does that look good?18:06
@jim:acmegating.comClark: then re sos -- that would be our rollback target if something goes wrong18:06
@clarkb:matrix.orgsounds like a plan. I'll try to dig into those changes after lunch likely18:07
@jim:acmegating.comi will slowly rebase and review them as well18:09
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 815077: Uncap concurrency in tests https://review.opendev.org/c/zuul/zuul/+/81507718:46
@srigowthami:matrix.orgHi All,19:04
We have a Openstack CI setup using zuul v3 and nodepool v3. Recently due to power outage all the CI vms went down and we have recovered them back.
After recovery I could notice the below errors in zuul/debug.log .
Cmd('git') failed due to: exit code(128)
cmdline: git ls-remote --heads --tags https://opendev.org/zuul/zuul-jobs
stderr: 'fatal: unable to access 'https://opendev.org/zuul/zuul-jobs/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none'
The configuration has not been changed but the openstack repo werent getting updated to master branch due to above error
Could you please help me with this.
@srigowthami:matrix.org * Hi All,19:07
We have a Openstack CI setup using zuul v3 and nodepool v3. Recently due to power outage all the CI vms went down and we have recovered them back.
After recovery I could notice the below errors in zuul/debug.log .
Cmd('git') failed due to: exit code(128)
cmdline: git ls-remote --heads --tags https://opendev.org/zuul/zuul-jobs
stderr: 'fatal: unable to access 'https://opendev.org/zuul/zuul-jobs/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none'
The configuration has not been changed but the openstack repos weren't getting updated to master branch due to above error
Could you please help me with this.
@clarkb:matrix.orgsrigowthami is your base operating system up to date? That seems likely related to the let's encrypt root cert expiry19:10
@fungicide:matrix.orgcorvus: should it be 4.11.0 since it adds a new api endpoint? otherwise yes that looks good to me19:14
@fungicide:matrix.orgi'm not sure how fastidious we've been about things like that in the past, so 4.10.4 also seems a reasonable choice19:15
@srigowthami:matrix.org> <@clarkb:matrix.org> srigowthami is your base operating system up to date? That seems likely related to the let's encrypt root cert expiry19:16
Clark: I am using xenial and the os on the zuul vm is up to date.
@fungicide:matrix.orgsrigowthami: ubuntu xenial is no longer receiving standard support from canonical, so unless you're getting esm from them you're running an end of life ubuntu version19:17
@jim:acmegating.comfungi: i don't think we make any guarantees about the api at this point.19:18
@jim:acmegating.comso i think .4 is still appropriate19:18
@fungicide:matrix.orgcorvus: cool, then patchlevel increase there seems legitimate19:18
@srigowthami:matrix.org> <@fungicide:matrix.org> srigowthami: ubuntu xenial is no longer receiving standard support from canonical, so unless you're getting esm from them you're running an end of life ubuntu version19:30
Yes, I have to update the OS to Bionic. Xenial support was ended in April 30 and it was working without any OS upgrade till now . I want to bring up this clusterand re-deploy another setup with Bionic.
Could you suggest if I can recover the vm in any other way ?
@fungicide:matrix.orgsrigowthami: upgrade it to a newer ubuntu version (like bionic/18.04 lts), purchase extended support from canonical for xenial, rebuild on a different distribution with newer trust chain and openssl version, or try to backport fixes to xenial yourself for the let's encrypt root ca change19:33
@fungicide:matrix.orgsrigowthami: you can find some information on the let's encrypt root cert expiration here... https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/19:41
@fungicide:matrix.orgsrigowthami: and this is a notice about ubuntu xenial reaching its end of standard support earlier this year... https://ubuntu.com/blog/ubuntu-16-04-lts-transitions-to-extended-security-maintenance-esm19:42
@fungicide:matrix.orgsrigowthami: anyway, your problem isn't specific to zuul, it's that the git client on ubuntu xenial can no longer communicate with remotes which have certs from let's encrypt19:44
@srigowthami:matrix.orgYes, the issue isn't with Zuul but with git client.on Xenial itself. Thank you for the way forward Clark and fungi 19:52
@clarkb:matrix.orgcorvus: left comments on https://review.opendev.org/c/zuul/zuul/+/809293 if you get a chance to take a look20:09
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 815343: CI image requires consistency cleanup https://review.opendev.org/c/zuul/zuul/+/81534320:16
@clarkb:matrix.orgcorvus: ^ ya I think it is focal that struggles with the gear tls situation. I've updated ^ to fix the image requires issue but not do python3.8 on focal20:17
@jim:acmegating.comClark: sgtm.20:18
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 809293: Add an API for ZK-backed objects https://review.opendev.org/c/zuul/zuul/+/80929320:22
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:20:22
- [zuul/zuul] 810328: Temporarily enqueue cycle changes for reporting https://review.opendev.org/c/zuul/zuul/+/810328
- [zuul/zuul] 811244: Dequeue items after we're done with them https://review.opendev.org/c/zuul/zuul/+/811244
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl:20:22
- [zuul/zuul] 809532: Simplified attribute API for ZKObjects https://review.opendev.org/c/zuul/zuul/+/809532
- [zuul/zuul] 809414: Make QueueItem a Zookeeper object https://review.opendev.org/c/zuul/zuul/+/809414
@jim:acmegating.comClark: ^ fixed20:23
@jim:acmegating.compushed 4.10.420:24
@clarkb:matrix.orgcorvus: a stray print made it into https://review.opendev.org/c/zuul/zuul/+/809293/11..12/zuul/zk/zkobject.py otherwise lgtm20:26
@jim:acmegating.comsigh20:26
@tobias.henkel:matrix.orgcorvus: since you just pushed 809293, you may want to remove a debug leftover20:27
@jim:acmegating.comdone20:27
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:20:27
- [zuul/zuul] 809293: Add an API for ZK-backed objects https://review.opendev.org/c/zuul/zuul/+/809293
- [zuul/zuul] 810328: Temporarily enqueue cycle changes for reporting https://review.opendev.org/c/zuul/zuul/+/810328
- [zuul/zuul] 811244: Dequeue items after we're done with them https://review.opendev.org/c/zuul/zuul/+/811244
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl:20:27
- [zuul/zuul] 809532: Simplified attribute API for ZKObjects https://review.opendev.org/c/zuul/zuul/+/809532
- [zuul/zuul] 809414: Make QueueItem a Zookeeper object https://review.opendev.org/c/zuul/zuul/+/809414
@jim:acmegating.comsorry about that.  i threw that in there to make sure i was getting the right path in the test :)20:28
@clarkb:matrix.orgcorvus: you got the +2s on the stack through https://review.opendev.org/c/zuul/zuul/+/809414/ if you want to approve them now. I'm going to work on reviewing 809414 next (I haven't done a review of it yet)20:30
@clarkb:matrix.organd doing that should be safe for opendev now that 4.10.4 has been pushed20:31
@clarkb:matrix.orgcorvus: another thought is whether or not the zkobject api should do sharded zk data due to the 1MB limit?20:37
@clarkb:matrix.orgseems like we could go over that depending on the situation?20:38
@jim:acmegating.comClark: that shows up in 812467, about 8 changes or so further down the stack20:38
@jim:acmegating.comClark: not all objects are sharded, just the big ones20:38
@jim:acmegating.comwe could shard everything, but there's a little overhead to sharding, so i think it makes sense to be selective about it20:39
@clarkb:matrix.orgmakes sense and good to know that has already been thought of20:39
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: [zuul/zuul] 810658: Store pipeline state in Zookeeper https://review.opendev.org/c/zuul/zuul/+/81065821:20
@jim:acmegating.comthat's where i've stopped +2ing for now --21:21
@jim:acmegating.comswest: https://review.opendev.org/810920 has a TODO which i think is done (left note inline), but would like you to confirm that's correct21:21
@clarkb:matrix.orgcorvus: swest left some thoughts/questions on https://review.opendev.org/c/zuul/zuul/+/809414 I don't think any are really worthy of a -1 but I don't know that I'm ready to +2 yet either :)21:22
@clarkb:matrix.orgcorvus: making sure I understand correctly with https://review.opendev.org/c/zuul/zuul/+/810658 the queues are still in memory just handled by the class that will eventually serialize them out to zookeeper until that happens?21:41
@clarkb:matrix.orgsmall nit on that change but otherwise I think it is fine (assuming I've got the correctly understood)21:42
@jim:acmegating.comClark: replied on 809414; let me know if that helps or hinders :)21:43
@jim:acmegating.comClark: and yes, re 810658, basically the way i look at this whole stack is that what's in memory is canonical, and we're just writing more and more stuff into zk (and reading it back) for no reason whatsoever.  :)  then, by the time we get to the end of the stack, everything should be written to/read from zk, and suddenly it will all have a purpose after all.21:45
@clarkb:matrix.org> <@jim:acmegating.com> Clark: replied on 809414; let me know if that helps or hinders :)21:46
That helped, I +2'd
@jim:acmegating.comdoing it in bite sized chunks with no consequences i think leaves us with more reviewable changes21:46
@jim:acmegating.comcause i'm pretty sure by the end of this, like every line in model.py will have changed :)21:46
@clarkb:matrix.orgI see. I figured that was what was going on it just felt at times like there was weird movement. Like why move the queus at all at this point. But it doesn't really have a consequence in that change, and I guess I should assume it makes later diffs or changes easier to understand21:47
@jim:acmegating.com(that's not literally true, btw, i checked, it's only 30%)21:48
@jim:acmegating.comClark: yeah, please feel free to ask tho :)21:48
@jim:acmegating.comi think you'll see a lot of that on object boundaries though... like do the pipeline state without the queues, then do the queues without the items, then the items without the buildsets, etc..21:49
@clarkb:matrix.orgShould we approve https://review.opendev.org/c/zuul/zuul/+/809414 and child? Or are you waiting for the Zuul +1s too?21:50
@jim:acmegating.comone thing to keep in mind is that the znode hierarchy means we have to go in that order21:50
@jim:acmegating.comi think we can +w them; was waiting for conversation to settle :)21:51
@jim:acmegating.comClark: if you feel like one more sos change, https://review.opendev.org/815196 tidies up a loose end22:15
@jim:acmegating.comi went and audited, and i think that's the last remaining stats call we need to wrap in an election22:16
-@gerrit:opendev.org- Ian Wienand proposed:22:26
- [zuul/nodepool] 806312: Update Docker and bindep for Bullseye base images https://review.opendev.org/c/zuul/nodepool/+/806312
- [zuul/nodepool] 814830: Switch to Python 3.9 images https://review.opendev.org/c/zuul/nodepool/+/814830
@clarkb:matrix.orgianw: ^ sorry I should've maybe been a bit more explicit in https://review.opendev.org/c/zuul/nodepool/+/806312/13/.zuul.yaml it lists the 3.8 images but we use 3.7. So you needed to switch to 3.7 and also switch to buster22:32
@iwienand:matrix.orgClark: umm, bullseye?  hang on let me double check what i've done22:34
@iwienand:matrix.orgoh i see i've mixed 3.7 and 3.8 22:35
@clarkb:matrix.orgalso my 3.8 change times out on the image builds. I guess I should've expected that. But also see notes above about 3.9 and zuul's struggles with it. Specifically we need to remove gear to make 3.9 happy in zuul. I think switching nodepool first with the goal to also switch zuul is reasonable. particularly since nodepool is already not on the same version of python as zuul (though that should be the goal) 22:36
@iwienand:matrix.orgahh, that's what confused me.  current master uses 3.7 in the Dockerfile but lists 3.8 images as dependencies for the zuul jobs22:38
@clarkb:matrix.orgya and it seems that decision was made to use the arm64 wheels that are already built for us otherwise the image builds time out (as discovered by my test of 3.8 builds)22:40
@iwienand:matrix.orgwe should have up-to-date arm64 3.8 wheels.  but as we discussed previously even with wheels, pip is a single-process bottleneck on the cross-builds22:40
@clarkb:matrix.orgare you sure we have 3.8 wheels? I think buster is 3.7 and bullseye is 3.9 so not sure where 3.8 would come from22:40
@clarkb:matrix.orgI'm not sure it would be safe to run ubuntu 3.8 wheels on bullseye22:40
@iwienand:matrix.orgahh, yes, that's true22:42
-@gerrit:opendev.org- Ian Wienand proposed:22:43
- [zuul/nodepool] 806312: Update Docker and bindep for Bullseye base images https://review.opendev.org/c/zuul/nodepool/+/806312
- [zuul/nodepool] 814830: Switch to Python 3.9 images https://review.opendev.org/c/zuul/nodepool/+/814830
@iwienand:matrix.orgClark: ^ that should make verything 3.7, then the next everything 3.922:43
@clarkb:matrix.organd actually that brings up the question of how safe it is to use buster 3.7 wheels on bullseye. I think that 806312 does this?22:43
@clarkb:matrix.orgwe aren't doing functional testing on arm right? So we wouldn't have signal if that is breaking anything?22:44
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 809293: Add an API for ZK-backed objects https://review.opendev.org/c/zuul/zuul/+/80929322:45
@iwienand:matrix.orgClark:  we are not really testing the arm64 container.  i had https://review.opendev.org/c/openstack/diskimage-builder/+/791888 out to switch functional testing to using the container environment22:46
@iwienand:matrix.orgi'm not so sure on that now22:47
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Matthieu Huin https://matrix.to/#/@mhuin:matrix.org: [zuul/zuul] 735586: Zuul-web: Add authentication-realm attribute to tenants https://review.opendev.org/c/zuul/zuul/+/73558623:06
@iwienand:matrix.orgClark: it's an interesting point.  i think our use of wheel bits is very minimal and very unlikely to break.  but it might be an idea to just squash the 3.9 update into the bullseye update23:07
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Matthieu Huin https://matrix.to/#/@mhuin:matrix.org: [zuul/zuul] 735586: Zuul-web: Add authentication-realm attribute to tenants https://review.opendev.org/c/zuul/zuul/+/73558623:11
-@gerrit:opendev.org- Clark Boylan proposed:23:15
- [zuul/zuul] 815389: Run the executor governor more often in testing https://review.opendev.org/c/zuul/zuul/+/815389
- [zuul/zuul] 815390: Disable load sensors during testing https://review.opendev.org/c/zuul/zuul/+/815390
@clarkb:matrix.orgcorvus: ^ related to failures after uncapping concurrency23:15
@clarkb:matrix.orgianw: ya considering testing for that is difficult that may be the safest option23:15
@jim:acmegating.comClark: comment on 38923:19
-@gerrit:opendev.org- Zuul merged on behalf of Matthieu Huin https://matrix.to/#/@mhuin:matrix.org: [zuul/zuul] 815305: Example docker-compose: support podman https://review.opendev.org/c/zuul/zuul/+/81530523:20
-@gerrit:opendev.org- Clark Boylan proposed:23:24
- [zuul/zuul] 815389: Run the executor governor more often in testing https://review.opendev.org/c/zuul/zuul/+/815389
- [zuul/zuul] 815390: Disable load sensors during testing https://review.opendev.org/c/zuul/zuul/+/815390
@clarkb:matrix.orgcorvus: ^ thanks I think that addresses it23:24
@jim:acmegating.comClark: and yeah, that looks like the kind of load average profile that caused us to want to cap concurrency in the first place23:25
@clarkb:matrix.orgya if the sampling and calculation of load average is frequent enough on the system side (I don't know how often linux does that calculation) then checking it more often in the tests might be sufficient to make this reliable23:26
@clarkb:matrix.orgbasically instead of sampling once and giving up on a 30 second test timeout we can sample ~20 more times or whatever the time span is and hopefully turn the executor back on again23:27
@clarkb:matrix.orgOne interesting thing is we seem to be very near that limit. Another option could be to see if a 3x multiplier is reasonable instaed of 2.5x at least for testing23:27
@clarkb:matrix.orgYet another approach would be to run classes of tests in some sequence where we know that we can use the fullconcurrency but without significant system load as a result23:28
@clarkb:matrix.org(I suspect that the tests that actually run ansible create much higher load impacts and spreading those out might help)23:28
@jim:acmegating.comis there a reasonable way to accomplish that with testr?23:28
@clarkb:matrix.orgcorvus: I think the most straightforward approach is via test name organization. Basically put the more expensive things in their own path and run those tests by path separately (possibly with a lower concurrency)23:29
@clarkb:matrix.orgBut then you're really relying on the tox target to run all the tests as just running stestr won't get everything23:30
@clarkb:matrix.orgfwiw I'm not sold on my two changes, but I thought they might provide us with more useful data and I didn't want to call it a day without pushing something related to this :)23:30
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul-jobs] 815089: ensure-dstat-graph: pull updated branch https://review.opendev.org/c/zuul/zuul-jobs/+/81508923:32
@jim:acmegating.comClark: ++23:34
@iwienand:matrix.orgClark: corvus are we in agreement to squash the update to 3.9 images into the upgrade-to-bullseye-based-images change, so that we ensure consistency of the arm64 wheels used?23:46
@iwienand:matrix.org(nodepool images)23:47
@clarkb:matrix.orgianw: I think we should. And make a note that zuul should update to 3.9 as soon as it is able to (gear has been removed)23:47
@clarkb:matrix.orgthen we'll be in sync and should attempt to remain in sync from that point on23:47
@jim:acmegating.comoh er i didn't know we had agreed to move to 3923:47
@jim:acmegating.comwhy can't they both be 3.8?23:47
@iwienand:matrix.orgcorvus: because we don't build 3.8 wheels for arm64, and cross-building without prebuilt wheels is too slow23:48
@jim:acmegating.comwhy don't we build 3.8 wheels for arm64?23:48
@clarkb:matrix.orgright we have python3.7 + buster and python3.9 + bullseye wheels available because those are the python versions that come with the distro releases23:48
@iwienand:matrix.orgi should say, 3.8+bullseye+arm64 wheels.  we build the 3.8 wheels on focal23:48
@clarkb:matrix.orgour python3.8 docker images are a special fancy special compile of python on top of buster and debian23:49
@jim:acmegating.comwho's the "we" building wheels here?23:49
@clarkb:matrix.orgcorvus: opendev builds the wheels from openstack requirements23:49
@clarkb:matrix.orgcorvus: for a large chunk of the python versions on the test nodes opendev supports23:49
@jim:acmegating.comthen it sounds like we need more wheel building jobs to build wheels for the software versions zuul uses?23:50
@clarkb:matrix.orgcorvus: that would be another option23:50
@clarkb:matrix.orghowever, making a python3.8 + debian builder might be weird since debian doesn't ship that? but maybe it does and has all the versions as optional installs23:51
@jim:acmegating.comlike, if we're just coasting by because coincidentally openstack happens to build wheels for some packages we're using, then basically we're just only accidentally able to build nodepool arm images23:51
@iwienand:matrix.orgwell, it's "build nodepool arm images in a reasonable time"23:51
@iwienand:matrix.orgwe could probably bump the job timeout to several hours and it would get there, eventually23:51
@jim:acmegating.comincreasing the timeout doesn't sound like a great option to me :)23:52
@jim:acmegating.comClark: but we build our zuul images with python3.8 on debian, right?  so that combo does exist?23:53
@clarkb:matrix.orgcorvus: yes, but it does turn out that there is a core set of python libs that are very common and erquire linking to external C libs. cryptography is one that we sovled for everyone but there are a few others still like cffi and friends23:53
@iwienand:matrix.orgthose zuul images don't use pre-built wheels though23:53
@clarkb:matrix.orgcorvus: yes, but only in a docker context using the python 3.8 images which happen to do their compile on debian23:53
@clarkb:matrix.orgdebian itself doesn't seem to make a bullseye python3.8 package23:53
@clarkb:matrix.orgthat isn't a complete road block but it means the existing toolign to update the wheel mirrors won't work out of the box23:54
@clarkb:matrix.orgwe would have to build the wheels inside the docker container I expect23:54
@jim:acmegating.comClark: oh, you're saying we get 3.8+debian because we use the "python" docker images23:54
@clarkb:matrix.org> <@jim:acmegating.com> Clark: oh, you're saying we get 3.8+debian because we use the "python" docker images23:54
Exactly
@clarkb:matrix.org3.7 and 3.9 work nicely with debian because buster is 3.7 and bullseye is 3.9. 3.8 is this weird in between where we would haev to bootstrap extra stuff ourselves23:55
@clarkb:matrix.orgdoable but has n't been done yet23:55
@iwienand:matrix.orgnothing is impossible, but the layout of volumes we have chosen, and the mirror setup bits, etc. make building arbitrary combinations of python+distro wheels not straight-forward23:55
@clarkb:matrix.org * doable but hasn't been done yet23:55
@iwienand:matrix.orgwe really came it from an angle of "these are the distros, and we're building wheels for this distro's python"23:56
@jim:acmegating.combut now that we're issuing images, we should probably build wheels for those image platforms.23:57
@clarkb:matrix.orgIdeally we'd address this like we did for cryptography but that effort involved there is relatively significant and requires the upstream for that dep to get involved so unlikely to be a quick fix23:58
@clarkb:matrix.org(opendev builds arm64 wheels for cryptography that are uploaded to pypi that everyone benefits from)23:58
@jim:acmegating.comClark: can you fill me in on how we addressed this for cryptography?23:58
@iwienand:matrix.org++ yes the ideal situation would be for us to make linuxabi wheels of these dependencies23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!