Thursday, 2021-10-21

-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:00:26
- [zuul/nodepool] 807464: Add metastatic driver https://review.opendev.org/c/zuul/nodepool/+/807464
- [zuul/nodepool] 814837: Add more log messages to azure driver https://review.opendev.org/c/zuul/nodepool/+/814837
@clarkb:matrix.org> <@iwienand:matrix.org> so, not sure what we could do to make buildx faster00:34
We relied on the wheels for buster to make it faster. Is this time cost with or without python3.9 bullseye prebuilt wheels?
@jim:acmegating.comClark: yes, we probably should test those commands, but i don't think we were planning on ensuring that was in the release; we actually restarted on the commit before those01:04
@jim:acmegating.comso... given that...01:06
@jim:acmegating.comzuul-maint: how does this look for a zuul release?  commit bfe5a4a93524e1b534851f3d04c4f1ad7d44eec7 (tag: 4.10.3, refs/changes/93/814493/3)01:06
@iwienand:matrix.orgClark: that's with it pointing at our bullseye 3.9 wheels.  pip isn't spending time forking to build, any slowness seems to be limited to whatever it's doing internally01:15
@clarkb:matrix.orgWow. Possibly doing dependency resolution? Maybe we can make that better with hints to that process01:15
-@gerrit:opendev.org- Tristan Cacqueray proposed:01:28
- [zuul/zuul] 814842: Demonstrate pragma take over override-checkout https://review.opendev.org/c/zuul/zuul/+/814842
- [zuul/zuul] 814843: Demonstrate removing pragma fix override-checkout https://review.opendev.org/c/zuul/zuul/+/814843
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 814848: Add addtional checks to key deletion testing https://review.opendev.org/c/zuul/zuul/+/81484802:05
@clarkb:matrix.orgcorvus ^ that is a couple extra checks that I thought about just now. Your testing with OpenDev made me realize we should double check that in our tests too02:05
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 760804: Store version information in component registry https://review.opendev.org/c/zuul/zuul/+/76080402:59
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 814862: Bail out when a project moves between connections https://review.opendev.org/c/zuul/zuul/+/81486207:52
@jkt_:matrix.orghi there, our OpenStack provider is messing with the API endpoint reverse proxy today, and Zuul reports that it cannot SSH to freshly-created VMs anymore ("permission denied")09:21
@jkt_:matrix.orgwhen I checked the VM's console, I see a line from cloud-init, `ci-info: no authorized SSH keys fingerprints found for user ci.`09:21
@jkt_:matrix.orgis that expected in normal operation, or is this a sign of a bug? how do I debug this further?09:22
@jkt_:matrix.orgI'm on nodepool `3.12.1.dev3` (these are my long-obsolete `runc` patches, nothing to the core), and zuul+nodepool have not been touched in a long time, and neither was their config09:23
@jkt_:matrix.orgper https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2020-08-05.log.html#t2020-08-05T08:41:12 this looks like a botched nova metadata service, that makes sense I guess09:32
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 814773: Move re-enqueue to pipeline processing https://review.opendev.org/c/zuul/zuul/+/81477310:59
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 814899: Delete old build sets immediately https://review.opendev.org/c/zuul/zuul/+/81489911:22
-@gerrit:opendev.org- Dong Zhang proposed: [zuul/zuul-jobs] 813034: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/81303413:16
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:13:45
- [zuul/nodepool] 814837: Add more log messages to azure driver https://review.opendev.org/c/zuul/nodepool/+/814837
- [zuul/nodepool] 807464: Add metastatic driver https://review.opendev.org/c/zuul/nodepool/+/807464
-@gerrit:opendev.org- Dong Zhang proposed: [zuul/zuul-jobs] 813034: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/81303413:47
@clarkb:matrix.orgI'll be in a different meeting this morning. But hope to join the zuul bof once my portion of that meeting ends14:02
@jim:acmegating.comzuul bof starting nowish at https://meetpad.opendev.org/zuul-2021-10-2114:03
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] 814996: WIP: Make the ConfigLoader work independently of the Scheduler https://review.opendev.org/c/zuul/zuul/+/81499614:04
@jim:acmegating.comdirect etherpad link: https://etherpad.opendev.org/p/zuul-2021-10-2114:06
-@gerrit:opendev.org- Simon Westphahl proposed:14:18
- [zuul/zuul] 814773: Move re-enqueue to pipeline processing https://review.opendev.org/c/zuul/zuul/+/814773
- [zuul/zuul] 814899: Delete old build sets immediately https://review.opendev.org/c/zuul/zuul/+/814899
@jim:acmegating.comwe wrapped up the bof; took some notes in the etherpad15:18
@clarkb:matrix.orgcorvus: thank you for the notes. I left thoughts on running zuul in opendev using the k8s operator. I think there are some large hurdles to get over there and I'm not sure it is currently a good fit. cc tristanC 15:32
@tristanc_:matrix.orgClark: i was suggesting a standalone deployment, like an untrusted service that the zuul community members would manage.15:48
@clarkb:matrix.orgtristanC: but where does the kubernetes come from? and does the zuul installation just sit idle then?15:49
@tristanc_:matrix.orgI don't know who can provide a kubernetes api, but i was thinking this demo could be used to run third-pary-ci job for the zuul-jobs to verify they can run in a nodepool container provider15:50
@tristanc_:matrix.orgin other words, there would be a zuul/k8s-demo-config project with some secret (a review.opendev.org ci account, the k8s api password) and the zuul crd15:52
@clarkb:matrix.orgI don't think that needs to be a third party CI. But ultimately the problem is opendev has tried to run k8s like three different ways and ran into issues with all of them and now have even fewer people to help with that. If there was a kubernetes we could plug the production nodepool into it15:55
@clarkb:matrix.org(but also if the kubernetes existed then there is potential to deploy other services in it, though I'm not sue if we should mix prod workload and ci workload so maybe we need two?)15:56
@tristanc_:matrix.orgi meant something untrusted/low risk where we could easily share the cluster api access with community members that would like to help get the operator production ready15:57
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 815025: DNM Manual depends on between dib and devstack https://review.opendev.org/c/zuul/zuul-jobs/+/81502515:58
@tristanc_:matrix.orgalternatively corvus suggested we could use google gerrit zuul for that instead, but i think this would have less visibility than if it was in the opendev zuul tenant.16:00
@clarkb:matrix.orgtristanC: I'm not sure such a thing can exist as it requires we use donated resources in a cloud and shouldn't waste them. That means we should have a minimal level of support and care and actually use the resources properly. But once we've done that we get into the trap where the opendev team is often left expected to supported an ever growing list of things as the amount of help shrinks :/16:03
@clarkb:matrix.orgBasically I'm saying I think this is possible, but we should do it in a way that makes it sustainable whcih for OpenDev likely means committing to converting over to k8s for many services as then the cost of running hte k8s becomes minimal compared to the gain we get from running the services. But to do that I think we need a fair bit of help. The whole replacing the motor while the plane is flying problem16:08
@tristanc_:matrix.orgClark: I was hoping such resources could be acquired specially for this purpose (demonstrate a production ready zuul operator), without needing any work from the opendev team, e.g. it would be fully managed by the zuul community16:08
@clarkb:matrix.orgI see, then I misunderstood what was meant by doing this in opendev16:09
@avass:vassast.orgtristanC: i suppose it may be enough to run a small k3s cluster on a single node?16:09
@tristanc_:matrix.orgAlbin Vass: right, or the minimum required resources would do16:10
@tristanc_:matrix.orgi take it from the meeting that the mysql operator needs at least 3 nodes16:10
@avass:vassast.orgYeah. I think it's possible to run a multi agent cluster on a single node with k3d16:11
@jim:acmegating.comyou can run pxc on one node (we do in tests), but if the goal is a realistic deployment, 3 would be better.16:14
@tristanc_:matrix.orgcorvus: may i request your feedback on 814676 , i can't tell where to look at for fixing this issue (using devstack job in our zuul)16:17
@jim:acmegating.comi'll try to take a look in a bit; i have to afk for while now16:17
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 815025: DNM Manual depends on between dib and devstack https://review.opendev.org/c/zuul/zuul-jobs/+/81502516:23
@tobias.henkel:matrix.orgmhu, corvus : q on 735586, what do you think?16:34
@mhuin:matrix.orgtobiash: works for me but I haven't followed the zk-related changes in detail, I'll need pointers to do it17:15
@clarkb:matrix.orgfungi: re your comment on https://review.opendev.org/c/opendev/system-config/+/814817/1 I wonder if we should upate that script to talk to zk for us?17:33
@clarkb:matrix.orgI think we land my docs change either way then improve as a followup, but thought I'd bring that up here in case other zuul users though the decrypt secret tool could be a bit more autmoatic17:36
@clarkb:matrix.orgzuulians https://review.opendev.org/c/zuul/zuul/+/814848 is a test only update to make the changes I made to delete-keys a bit more robustly tested. Nothing needs to chagne in the actual implementation so not an emergency but it occurred to me we should check the additional behavior that is tested in that change.18:04
@fungicide:matrix.org> <@clarkb:matrix.org> I think we land my docs change either way then improve as a followup, but thought I'd bring that up here in case other zuul users though the decrypt secret tool could be a bit more autmoatic18:37
yeah, i think those are separate (albeit related) concerns
@fungicide:matrix.org> <@jim:acmegating.com> zuul-maint: how does this look for a zuul release?  commit bfe5a4a93524e1b534851f3d04c4f1ad7d44eec7 (tag: 4.10.3, refs/changes/93/814493/3)18:44
sorry for the slow response, looks okay to me but opendev is running a later commit than that right? (new enough to have the /components api but not new enough to have the version info)? is that the reason for picking bfe5a4a instead of 1df09a8? i guess the latter would be fodder for a minor version increase to 4.11.0 instead of just a patchlevel increase...
@clarkb:matrix.orgfungi: that commit is the commit opendev was on before the most recent restart. And ya I think corvus wants to avoid tagging where only the /components api without versions is available18:46
@fungicide:matrix.orgyep, makes perfect sense to me. i'm in favor18:48
@jim:acmegating.compushed 4.10.320:42
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 814685: DNM: Test unit tests on larger nodes https://review.opendev.org/c/zuul/zuul/+/81468520:48
@jim:acmegating.comlooking at https://review.opendev.org/814684, i think the issue with the tests in the zk stack is not so much that they are taking longer, but maybe that they are putting more load on zk and/or the host, which is causing zk disconnects.   so maybe the way to address that is not to increase the job runtime, but to increase the zk session timeout.  (but maybe the real answer is larger nodes; we'll see what 814685 says if i can every get the syntax right)20:53
@jim:acmegating.comas a point of interest... when we run yarn in verbose mode, it's not kidding.  i think it's responsible for 145,000 lines of job output.  maybe we should not use verbose?20:55
@clarkb:matrix.orgwoah ++ to not being verbose in that case20:55
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 815072: Don't use --verbose with yarn https://review.opendev.org/c/zuul/zuul/+/81507221:00
@jim:acmegating.comoriginally added in https://review.opendev.org/743623 -- i'm assuming it was just to confirm we were using the opendev mirrors21:00
@jim:acmegating.comand maybe it wasn't as verbose then :)21:00
@clarkb:matrix.orgcorvus: https://review.opendev.org/c/zuul/zuul/+/814848 is an easy review if you have time21:07
@jim:acmegating.comdone21:07
@clarkb:matrix.orgLooking at opendev's production zk resource utilization it seems that we use 1.3g of memory and about a whole cpu on the leader. Are the tests hitting zk a lot harder due to density of operations?21:18
@jim:acmegating.comClark: i imagine so.... maybe i should dust off the dstat role.  but it stands to reason.  the tests seem to start reliably failing about the middle of the current "put everything left in zk" stack.  and each test thread is probably going to be mostly doing operatons that go to zk.  and there's usually more than one test running simultaneously.21:20
@clarkb:matrix.orgya and its a sprint to get through the tests vs zuul in production which tends to haev a more smooth input of events21:21
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 815077: Run dstat in tox jobs https://review.opendev.org/c/zuul/zuul/+/81507721:26
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 815078: Return dstat graph artifact https://review.opendev.org/c/zuul/zuul-jobs/+/81507821:32
@jim:acmegating.comon the off chance the dstat roles still work, that might be nice21:32
@jim:acmegating.com.3 release jobs succeeded21:34
@jim:acmegating.comianw, fungi, Clark: can you take a look at the comments on https://review.opendev.org/81507822:09
@jim:acmegating.comi haven't really kept up with the death of dstat...22:09
@jim:acmegating.comdstat crashes.  'dool' is not available in ubuntu.22:11
@iwienand:matrix.orgdidn't redhat "take it over" and then roll it into ... copilot or something like that?22:12
@iwienand:matrix.orgi seem to remember dealing with installing bits of this on fedora at some point22:13
@jim:acmegating.comhrm, i'm not seeing options for installing co-pilot or copilot on ubuntu22:14
@clarkb:matrix.orgits performance co-pilot or pcp. Problem is the packages dont' work reliably22:14
@iwienand:matrix.orgpmlogger might be what we want in 202122:14
@clarkb:matrix.orghttps://bugs.launchpad.net/devstack/+bug/194318422:15
@clarkb:matrix.orgcollectl was what I was looking at replacing pcp with in devstack but I guess it isn't very maintained either22:15
@clarkb:matrix.orgI thought dstat was removed preemptively and didn't realize it didn't work at all22:15
@jim:acmegating.comClark: do you think the pcp problem you reported would affect incidental use for unit tests?22:17
@jim:acmegating.comlike, i guess if it fails to start we don't care...?22:17
@clarkb:matrix.orgya if you install it with ansible saying failed when false or similar then it should be fine22:18
@clarkb:matrix.orgyou just won't get the data. In devstack's case the problem is it treats the package installation as an error22:18
@jim:acmegating.comokay, anyone know how to use pcp? :)22:18
@clarkb:matrix.orgcorvus: I think if you install it it creates a drop in replacement for dstat22:19
@clarkb:matrix.orgthat is how devstack uses it at least22:19
@jim:acmegating.comokay yeah that appears to work22:19
@jim:acmegating.comapt-get install pcp; dstat -tcmndrylpg --tcp --output /tmp/test.csv22:20
@jim:acmegating.compresumably if we feed that to dstat-graph we'll get a graph22:20
@jim:acmegating.comis there something more better copilot-ish we should do to collect data and make a graph?22:20
@clarkb:matrix.orgjust be sure to make the install fail gracefully and the dstat run itself also fail gracefully? Though the dstat replacement may not rely on the stuff that fails to startup during install22:20
@clarkb:matrix.orgI'm not sure about copilot tooling. My only real exposure to it was debugging the devstack issues with it and suggesting collectl as a replacement22:21
@iwienand:matrix.orgcorvus: looks like there's "pmdumptext" and then a bunch of gui-type tools22:21
@jim:acmegating.comand just to confirm, you're not suggesting that now?22:21
@iwienand:matrix.orgpmchart/pmtime22:21
@clarkb:matrix.orgcorvus: I think the pcp toolchain is super overkill and poorly designed (leading to the problems with basic package installation). But collectl like dstat is not maintained anymore so it may be the least evil thing if you can ignore the failures22:22
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 815078: Use pcp instead of dstand and return dstat graph artifact https://review.opendev.org/c/zuul/zuul-jobs/+/81507822:26
@clarkb:matrix.orgBasically there are a lot of not good options but if pcp works enough and when it doesn't we can keep going then it is likely the simplest option22:26
@jim:acmegating.comClark, ianw: maybe it's that easy? ^22:26
@clarkb:matrix.orgya that looks like it may avoid the known issues with pcp22:26
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 815077: Run dstat in tox jobs https://review.opendev.org/c/zuul/zuul/+/81507722:27
@jim:acmegating.comadded depends-on since we know dstat won't produce data without that patch22:27
@iwienand:matrix.orgi'll put on the todo list to play with pmlogger, etc.  maybe a text dump (with timestamps) and something similar to the download-logs script where it's like "pipe this command to get a gui view of the stats" could be a useful combo22:29
@iwienand:matrix.orgthat html view is a bit janky -- i looked at updating it once but every library it uses has the same name, but has essentially compeltely re-written itself in the mean time22:29
@iwienand:matrix.orgi.e. standard javascript22:29
@jim:acmegating.comianw: yeah -- i really dig being able to see it in a web page, so that's my #1 priority, but if there's rich local interface, that's nice too.  hopefully pcp can do both, but tbh, i will probably never use the local one.22:30
@jim:acmegating.comi ran pmgraph long enough to know i have no idea what it wants from me22:32
@jim:acmegating.comoh it's failed_when22:39
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 815078: Use pcp instead of dstand and return dstat graph artifact https://review.opendev.org/c/zuul/zuul-jobs/+/81507822:40
@iwienand:matrix.orgjust on zuul-jobs, could i get a couple of eyes on https://review.opendev.org/c/zuul/zuul-jobs/+/814695 to remove stretch testing and https://review.opendev.org/c/zuul/zuul-jobs/+/812273 adds a bit more testing to the rust roles to ensure we don't break pyca22:47
@clarkb:matrix.orgianw: I guess that first one doesn't use zuul-jobs tagged job creation mechanism for all the platforms?22:52
@iwienand:matrix.orgClark: no, not sure on the history there, i'm assuming it was targeted at a limited set of platforms22:53
@jim:acmegating.comoh hey cool the dstat graph change worked - ianw if you want to take another look22:56
@jim:acmegating.comrechecking the zuul change now22:57
@iwienand:matrix.orglgtm.  the page is not quite right in my firefox but that's the way it's always been22:58
@jim:acmegating.comthere does seem to be some broken bits on the graph page, which i guess is why ianw wanted to update it, but it's something.22:59
@jim:acmegating.comyeah22:59
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-jobs] 815078: Use pcp instead of dstand and return dstat graph artifact https://review.opendev.org/c/zuul/zuul-jobs/+/81507823:17

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!