@tristanc_:matrix.org | I get it some new ideas are needed to solve unexpected problems, and I think its ok to fasttrack their implementations, but I worry the important details will be forgotten if we don't write them down (after such a change settle). | 00:04 |
---|---|---|
@jim:acmegating.com | tristanC: yeah. i think "done" is probably after zuul-web is updated to read everything from zk, which is the next and final major effort. i'm sure there will be more to do after that, but by then i think the zk schema and concepts will be fairly settled. basically, it'll be feature complete. :) | 00:16 |
@jim:acmegating.com | i did start on https://review.opendev.org/809300 but it needs a lot more work. it's mostly a placeholder right now. | 00:17 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 817126: Move pipeline state to /zuul/tenant https://review.opendev.org/c/zuul/zuul/+/817126 | 00:19 | |
@spamaps:spamaps.ems.host | ```root@e3b6233e6707:/# nodepool list | 00:20 |
+------------+------------+--------------+-----------+-------------+------+----------+-------------+--------+ | ||
| ID | Provider | Label | Server ID | Public IPv4 | IPv6 | State | Age | Locked | | ||
+------------+------------+--------------+-----------+-------------+------+----------+-------------+--------+ | ||
| 0000000070 | static-vms | ubuntu-focal | None | None | None | building | 00:00:05:11 | locked | | ||
+------------+------------+--------------+-----------+-------------+------+----------+-------------+--------+``` | ||
@spamaps:spamaps.ems.host | How do I delete this node? | 00:20 |
@spamaps:spamaps.ems.host | I have ubuntu-focal set to min-ready: 0 | 00:20 |
@jim:acmegating.com | spamaps: nodepool delete 0000000070 | 00:21 |
@spamaps:spamaps.ems.host | times out | 00:21 |
@spamaps:spamaps.ems.host | it's locked | 00:21 |
@jim:acmegating.com | spamaps: restart the launcher and let it clean it up | 00:21 |
@spamaps:spamaps.ems.host | Ah ok restart cleaned it up right away ty | 00:22 |
@spamaps:spamaps.ems.host | Hopefully some day nodepool will clean up after itself if you delete a provider | 00:22 |
@jim:acmegating.com | https://zuul-ci.org/docs/nodepool/operation.html#removing-a-provider | 00:23 |
@spamaps:spamaps.ems.host | there must be something still in ZK | 00:23 |
@spamaps:spamaps.ems.host | nodepool list shows nothing but deleting the provider angers launcher | 00:24 |
@spamaps:spamaps.ems.host | ```launcher_1 | 2021-11-09 00:18:56,634 INFO nodepool.NodeDeleter: Deleting ZK node id=0000000071, state=deleting, external_id=node | 00:27 |
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: Cannot re-register deleted node Node(hostname='node', username='ro | ||
ot', port=22): | ||
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: Traceback (most recent call last): | ||
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/site-packages/nodepool/driver/s | ||
tatic/provider.py", line 489, in nodeDeletedNotification | ||
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: self.registerNodeFromConfig( | ||
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/site-packages/nodepool/driver/s | ||
tatic/provider.py", line 188, in registerNodeFromConfig | ||
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: host_keys = self.checkHost(static_node) | ||
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/site-packages/nodepool/driver/s | ||
tatic/provider.py", line 64, in checkHost launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: keys = nodeutils.nodescan(node["name"], | ||
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/site-packages/nodepool/nodeutil | ||
s.py", line 70, in nodescan launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: addrinfo = socket.getaddrinfo(ip, port)[0] | ||
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/socket.py", line 953, in getadd | ||
rinfo launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: for res in _socket.getaddrinfo(host, port, family, type, pro | ||
to, flags): | ||
launcher_1 | 2021-11-09 00:18:56,744 ERROR nodepool.driver.static.StaticNodeProvider: socket.gaierror: [Errno -2] Name or service not known launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: Couldn't sync node: | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: Traceback (most recent call last): | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/site-packages/nodepool/driver/static/provider.py", line 427, in cleanupLeakedResources | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: self.syncNodeCount(registered, node, pool) launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/site-packages/nodepool/driver/s | ||
tatic/provider.py", line 319, in syncNodeCount | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: self.registerNodeFromConfig( | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/site-packages/nodepool/driver/s | ||
tatic/provider.py", line 188, in registerNodeFromConfig | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: host_keys = self.checkHost(static_node) | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/site-packages/nodepool/driver/s | ||
tatic/provider.py", line 64, in checkHost | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: keys = nodeutils.nodescan(node["name"], | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/site-packages/nodepool/nodeutil | ||
s.py", line 70, in nodescan | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: addrinfo = socket.getaddrinfo(ip, port)[0] | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: File "/usr/local/lib/python3.9/socket.py", line 953, in getadd | ||
rinfo | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: for res in _socket.getaddrinfo(host, port, family, type, pro | ||
to, flags): | ||
launcher_1 | 2021-11-09 00:19:41,977 ERROR nodepool.driver.static.StaticNodeProvider: socket.gaierror: [Errno -2] Name or service not known``` | ||
@spamaps:spamaps.ems.host | launcher prints this every time it starts up | 00:27 |
@spamaps:spamaps.ems.host | and so far won't start any nodes | 00:27 |
@spamaps:spamaps.ems.host | Ah that last part was a bad merge on my part.. now it's making nodes | 00:32 |
@clarkb:matrix.org | arg 816094 failed again | 00:36 |
@clarkb:matrix.org | I'll just let it be until zuul is reloaded with the functional fixes | 00:37 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816208: WIP Use AuthProvider https://review.opendev.org/c/zuul/zuul/+/816208 | 00:58 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Matthieu Huin https://matrix.to/#/@mhuin:matrix.org: | 00:58 | |
- [zuul/zuul] 768199: Web UI: add Autoholds, Autohold page https://review.opendev.org/c/zuul/zuul/+/768199 | ||
- [zuul/zuul] 810699: Web UI: Show pipeline types as icons https://review.opendev.org/c/zuul/zuul/+/810699 | ||
- [zuul/zuul] 781858: web UI: allow a privileged user to promote a change https://review.opendev.org/c/zuul/zuul/+/781858 | ||
- [zuul/zuul] 802559: Web UI: Add "Create Autohold Request" form, improve API error messages https://review.opendev.org/c/zuul/zuul/+/802559 | ||
- [zuul/zuul] 769943: Example Docker compose: keycloak integration https://review.opendev.org/c/zuul/zuul/+/769943 | ||
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Matthieu Huin https://matrix.to/#/@mhuin:matrix.org: [zuul/zuul] 768115: Web UI: allow a privileged user to request autohold https://review.opendev.org/c/zuul/zuul/+/768115 | 01:29 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816208: WIP Use AuthProvider https://review.opendev.org/c/zuul/zuul/+/816208 | 01:30 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Matthieu Huin https://matrix.to/#/@mhuin:matrix.org: | 01:30 | |
- [zuul/zuul] 768199: Web UI: add Autoholds, Autohold page https://review.opendev.org/c/zuul/zuul/+/768199 | ||
- [zuul/zuul] 810699: Web UI: Show pipeline types as icons https://review.opendev.org/c/zuul/zuul/+/810699 | ||
- [zuul/zuul] 781858: web UI: allow a privileged user to promote a change https://review.opendev.org/c/zuul/zuul/+/781858 | ||
- [zuul/zuul] 802559: Web UI: Add "Create Autohold Request" form, improve API error messages https://review.opendev.org/c/zuul/zuul/+/802559 | ||
- [zuul/zuul] 769943: Example Docker compose: keycloak integration https://review.opendev.org/c/zuul/zuul/+/769943 | ||
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 817122: Shard config errors https://review.opendev.org/c/zuul/zuul/+/817122 | 01:38 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 816094: Adjust spacing on status page https://review.opendev.org/c/zuul/zuul/+/816094 | 03:52 | |
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of Felix Edel: | 08:17 | |
- [zuul/zuul] 816807: Split up registerScheduler() and onLoad() methods https://review.opendev.org/c/zuul/zuul/+/816807 | ||
- [zuul/zuul] 814996: Make the ConfigLoader work independently of the Scheduler https://review.opendev.org/c/zuul/zuul/+/814996 | ||
- [zuul/zuul] 816361: Load system config and tenant layouts in zuul-web https://review.opendev.org/c/zuul/zuul/+/816361 | ||
- [zuul/zuul] 816362: Implement job freezing API in zuul-web https://review.opendev.org/c/zuul/zuul/+/816362 | ||
- [zuul/zuul] 816514: Implement management events directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/816514 | ||
- [zuul/zuul] 816783: Implement autohold endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/816783 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 08:17 | |
- [zuul/zuul] 817003: Store pipeline status for zuul-web in Zookeeper https://review.opendev.org/c/zuul/zuul/+/817003 | ||
- [zuul/zuul] 817004: Use pipeline status from Zookeeper in zuul-web https://review.opendev.org/c/zuul/zuul/+/817004 | ||
- [zuul/zuul] 817157: Use abide for listing pipelines in zuul-web https://review.opendev.org/c/zuul/zuul/+/817157 | ||
- [zuul/zuul] 817158: Use abide for getting public keys in zuul-web https://review.opendev.org/c/zuul/zuul/+/817158 | ||
-@gerrit:opendev.org- Felix Edel proposed: | 08:23 | |
- [zuul/zuul] 817159: Implement job endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817159 | ||
- [zuul/zuul] 817160: Implement project endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817160 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 08:42 | |
- [zuul/zuul] 817158: Use abide for getting public keys in zuul-web https://review.opendev.org/c/zuul/zuul/+/817158 | ||
- [zuul/zuul] 817164: Use abide for getting project SSH keys in zuul-web https://review.opendev.org/c/zuul/zuul/+/817164 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 09:32 | |
- [zuul/zuul] 817171: Check authentication directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817171 | ||
- [zuul/zuul] 817172: Create list of admin tenants directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817172 | ||
@mhuin:matrix.org | > <@jim:acmegating.com> tristanC: yeah. i think "done" is probably after zuul-web is updated to read everything from zk, which is the next and final major effort. i'm sure there will be more to do after that, but by then i think the zk schema and concepts will be fairly settled. basically, it'll be feature complete. :) | 09:45 |
On that note, the zuul admin CLI still supports a gearman client for autoholds, enqueues etc. How about we remove the stuff that can be done through the web API -and thus zuul-client- and keep only create-auth-token, tenant-conf-check and the ZK-related commands once the web admin GUI is finalized? | ||
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 817174: Use abide to get (dis-)allowed labels in zuul-web https://review.opendev.org/c/zuul/zuul/+/817174 | 09:49 | |
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 817177: Create list of connections directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817177 | 10:23 | |
-@gerrit:opendev.org- Simon Westphahl proposed: | 11:36 | |
- [zuul/zuul] 817164: Use abide for getting project SSH keys in zuul-web https://review.opendev.org/c/zuul/zuul/+/817164 | ||
- [zuul/zuul] 817171: Check authentication directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817171 | ||
- [zuul/zuul] 817172: Create list of admin tenants directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817172 | ||
- [zuul/zuul] 817174: Use abide to get (dis-)allowed labels in zuul-web https://review.opendev.org/c/zuul/zuul/+/817174 | ||
- [zuul/zuul] 817177: Create list of connections directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817177 | ||
- [zuul/zuul] 817193: Use config errors directly from layout in zuul-web https://review.opendev.org/c/zuul/zuul/+/817193 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 11:41 | |
- [zuul/zuul] 817171: Check authentication directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817171 | ||
- [zuul/zuul] 817172: Create list of admin tenants directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817172 | ||
- [zuul/zuul] 817174: Use abide to get (dis-)allowed labels in zuul-web https://review.opendev.org/c/zuul/zuul/+/817174 | ||
- [zuul/zuul] 817177: Create list of connections directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817177 | ||
- [zuul/zuul] 817193: Use config errors directly from layout in zuul-web https://review.opendev.org/c/zuul/zuul/+/817193 | ||
@westphahl:matrix.org | mhu: yep, I think that's the plan | 11:43 |
-@gerrit:opendev.org- Felix Edel proposed: | 11:59 | |
- [zuul/zuul] 817159: Implement job endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817159 | ||
- [zuul/zuul] 817160: Implement project endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817160 | ||
- [zuul/zuul] 817196: Implement tenant endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817196 | ||
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of Felix Edel: | 13:44 | |
- [zuul/zuul] 817159: Implement job endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817159 | ||
- [zuul/zuul] 817160: Implement project endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817160 | ||
- [zuul/zuul] 817196: Implement tenant endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817196 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 13:44 | |
- [zuul/zuul] 817174: Use abide to get (dis-)allowed labels in zuul-web https://review.opendev.org/c/zuul/zuul/+/817174 | ||
- [zuul/zuul] 817177: Create list of connections directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817177 | ||
- [zuul/zuul] 817193: Use config errors directly from layout in zuul-web https://review.opendev.org/c/zuul/zuul/+/817193 | ||
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of Felix Edel: | 13:45 | |
- [zuul/zuul] 817160: Implement project endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817160 | ||
- [zuul/zuul] 817196: Implement tenant endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817196 | ||
@spamaps:spamaps.ems.host | corvus: correct to say that the GCE driver doesn't support network arguments for instances? I'm going to need those.. happy to add it. | 14:37 |
@spamaps:spamaps.ems.host | Wow we still don't have pretty ansi color log viewer? I thought by now y'all would have gotten that one working. ;) | 14:56 |
@avass:vassast.org | spamaps: we did have one but it couldn't handle large log files so it was reverted until someone implements something that doesn't take >60sec to render large files :) | 14:57 |
@spamaps:spamaps.ems.host | Ah, yeah some of the other CI systems have that same problem. I've noticed they implement as a ring buffer. | 14:57 |
@spamaps:spamaps.ems.host | If you start scrolling up they drop the new stuff. | 14:57 |
-@gerrit:opendev.org- Tristan Cacqueray proposed: [zuul/zuul] 762759: web: integrate re-ansi to render ANSI code in Zuul consoles https://review.opendev.org/c/zuul/zuul/+/762759 | 15:10 | |
@tristanc_:matrix.org | Albin Vass: spamaps well here is a change to add ansi color https://review.opendev.org/c/zuul/zuul/+/762759 . @corvus could you please re-evaluate your -2 ? | 15:12 |
@goneri:matrix.org | Hi! We've recently got some short (less 60s) network outages. And this seems to be the cause of some deadlocks in nodepool. My understanding ( and I'm not an expert :-) ) is that nodepool get the request from zs and starts to process the it. It pushes the lock in zk and lose the connection. It quickly gives up because of an exception coming from Kazoo ( https://paste.openstack.org/show/810873/ ). But the lock remains. When it tries to retry, it's too late, a lock is already here. | 15:12 |
@clarkb:matrix.org | Gonéri: the node request and the locks should be ephemeral which means if you lose connectivity they go away. I don't know why that connectionloss wouldn't unlock the locks | 15:17 |
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] 817228: Remove gearman from zuul-web https://review.opendev.org/c/zuul/zuul/+/817228 | 15:20 | |
@jpew:matrix.org | I have a cache cleanup job I want to run once a week with zuul, and while it's running I don't want any other jobs to run.... is there a way to do that? | 15:30 |
@goneri:matrix.org | Clark: the Kazoo documentation says the ephemeral nodes are deleted if the state transition to LOST, here we just switch to SUSPEND. | 15:32 |
@goneri:matrix.org | * Clark: the Kazoo documentation says the ephemeral nodes are deleted if the state transition to LOST, here we just switch to SUSPENDED. | 15:33 |
@jim:acmegating.com | jpew: the cleanup is a zuul job, or an external thing? | 15:34 |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 816918: Upgrade react-router-dom https://review.opendev.org/c/zuul/zuul/+/816918 | 15:34 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 734082: web UI: user login with OpenID Connect https://review.opendev.org/c/zuul/zuul/+/734082 | 15:34 | |
@jpew:matrix.org | corvus: A zuul job | 15:34 |
@jim:acmegating.com | jpew: i don't have a good solution; semaphores don't have a read/write variant which i assume you would need for that. | 15:35 |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 734850: web UI: allow a privileged user to dequeue a change https://review.opendev.org/c/zuul/zuul/+/734850 | 15:35 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 736772: web UI: allow a privileged user to re-enqueue a change https://review.opendev.org/c/zuul/zuul/+/736772 | 15:36 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 768115: Web UI: allow a privileged user to request autohold https://review.opendev.org/c/zuul/zuul/+/768115 | 15:36 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 768199: Web UI: add Autoholds, Autohold page https://review.opendev.org/c/zuul/zuul/+/768199 | 15:36 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 810699: Web UI: Show pipeline types as icons https://review.opendev.org/c/zuul/zuul/+/810699 | 15:36 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 781858: web UI: allow a privileged user to promote a change https://review.opendev.org/c/zuul/zuul/+/781858 | 15:37 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 802559: Web UI: Add "Create Autohold Request" form, improve API error messages https://review.opendev.org/c/zuul/zuul/+/802559 | 15:37 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 769943: Example Docker compose: keycloak integration https://review.opendev.org/c/zuul/zuul/+/769943 | 15:37 | |
@goneri:matrix.org | My bad, when the connections back, the state switch to LOST -> CONNECTED. So the locks should be removed at this moment. | 15:58 |
@goneri:matrix.org | Actaully, the node request's got ephemeralOwner == 0 in zk: https://paste.openstack.org/show/810883/ | 16:17 |
@jim:acmegating.com | correct, node requests are not ephemeral | 16:17 |
@goneri:matrix.org | Oh, so this explains my endless loop of [node_request: 300-0000678000] Request is locked by someone else. | 16:18 |
@clarkb:matrix.org | but the locks are ephemeral? | 16:18 |
@jim:acmegating.com | locks are ephemeral | 16:19 |
@jim:acmegating.com | well, some are | 16:19 |
@jim:acmegating.com | nodepool's lock on the request is | 16:19 |
@clarkb:matrix.org | Gonéri: you can list the ephemeral nodes and their sessions. Then you can list sessions which includes IP addrs to identify the lock owner | 16:20 |
@goneri:matrix.org | Well, I think my problem is at the node request lock. | 16:21 |
@clarkb:matrix.org | dump to list the epehermal nodes and sessions and cons for sessions | 16:21 |
@goneri:matrix.org | The node request is not processed and the lock remains. | 16:21 |
@jim:acmegating.com | Clark: can you take a look at https://review.opendev.org/817126 ? i think we might want to go ahead and rip the bandage off on that one. | 16:22 |
@clarkb:matrix.org | I always have to look that up. maybe I should write it down somewhere | 16:22 |
@clarkb:matrix.org | corvus: that will require a stop, clear of zk data, then start? Or just start with new data on the new path and sometime later delete the old path contents? | 16:23 |
@jim:acmegating.com | Clark: i'd do the first i think | 16:24 |
@jim:acmegating.com | but the second would work | 16:24 |
@clarkb:matrix.org | corvus: does my nit there change anything? perhaps there are other places we need to update if there was a pipeline name conflict (I think the conflict may be with tenant names?) | 16:26 |
@jim:acmegating.com | Clark: you're right, it was just a commit message typo | 16:26 |
@jim:acmegating.com | i think we should just go ahead and approve it | 16:27 |
@clarkb:matrix.org | wfm I'll +A | 16:28 |
@clarkb:matrix.org | I was just concerned I might have missed something important there and if I had it was worth double checking :) | 16:28 |
@jim:acmegating.com | Gonéri: Ie5aae0704e5925a5bcc73cc6bc0bcb91287ab26e fixed an issue with node requests in zuul; i doubt it could cause the behavior you're seeing, but perhaps related. just fyi. | 16:30 |
@goneri:matrix.org | An uncaught exception that happens after the node request lock in _assignHandlers https://paste.openstack.org/show/810885/ | 16:38 |
@goneri:matrix.org | note: my node request state in zk is: "requested". | 16:49 |
@goneri:matrix.org | If I understand correctly, _removeCompletedHandlers() is in charge of cleaning this kind of unachieved node request. It loops on self.request_handlers list. But it does not include our thread. Shouldn't this be done before? https://opendev.org/zuul/nodepool/src/branch/master/nodepool/launcher.py#L199 e.g: https://paste.openstack.org/show/810886/ | 17:02 |
@clarkb:matrix.org | Gonéri: is the request internally completed? I wonder if the exception caused it to bail out early so a necessary state transition is not occuring | 17:11 |
@clarkb:matrix.org | I would inspect the znode record in zk direclty to check that | 17:12 |
@goneri:matrix.org | Yes, the node request is stuck. Nothing has been started. | 17:12 |
@clarkb:matrix.org | should give you the current state and other info | 17:12 |
@goneri:matrix.org | I pasted that above, is this what you want? https://paste.openstack.org/show/810883/ | 17:13 |
@clarkb:matrix.org | I'm wondering if this is the thing opendev has seen in the past where we have had to restart launchers | 17:13 |
@clarkb:matrix.org | but we haven't seen that recently | 17:13 |
@clarkb:matrix.org | Gonéri: no the actual content of the record in the database not the metadata. That should tell you who has declined it, who is currently working on it, what state the request is in and so on | 17:14 |
@goneri:matrix.org | From the log, it's clear nothing is working on it. I've just an endless loop of "Request is locked by someone else". | 17:15 |
@goneri:matrix.org | It has been like that for 12h. | 17:15 |
@clarkb:matrix.org | right which is why I've suggested you determine who it is locked by and work from there | 17:20 |
@clarkb:matrix.org | I think there are two approaches for that. Either examine the record data directly to see if it was recorded or use the two zookeeper four letter commands to figure it out | 17:20 |
@clarkb:matrix.org | since connection losses should remove ephemeral locks | 17:20 |
@clarkb:matrix.org | I think it is a good idea to determine how the lock exists | 17:21 |
@goneri:matrix.org | Clark: I need your help on this, where is the information you looking for. In MariaDB or in zk? | 17:37 |
@clarkb:matrix.org | Gonéri: all in zookeeper. The first is similar to what you have in your paste at https://paste.openstack.org/show/810883/ but instead of stat'ing the record you can get the record (I think the command is get). The second is zookeeper's 4 letter commands which can be sent to the client port but depending on how your zookeeper is deployed they may or may not be enabled https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkCommands There is also a rest api that accepts the four letter commands that may or may not be enabled | 17:39 |
@clarkb:matrix.org | The idea is to use that to get the identity of the lock holder in nodepool or zuul. THen from that figure out why the lock is held even though no action is taken | 17:40 |
@goneri:matrix.org | ah! thanks :-) https://paste.openstack.org/show/810888/ | 17:40 |
@clarkb:matrix.org | interesting it doesn't have any declined by entries | 17:42 |
@clarkb:matrix.org | and is still in the requested state. This must have happened very early in the request process | 17:43 |
@goneri:matrix.org | Yes, this one happened at the beginning of the first try. And zk was away after, so no way to update the record. | 17:43 |
@clarkb:matrix.org | right but the lock should've been removed? then processing could try again via locking it and proceeding | 17:43 |
@goneri:matrix.org | This is how I would address that: https://paste.openstack.org/show/810889/ | 17:44 |
@clarkb:matrix.org | I think the idea is the request handler failed and the lock goes away so we don't need to continue to track that request handler but I may be missing something important around that | 17:45 |
@goneri:matrix.org | The exception is not caught and so we break the loop. But the lock remains. | 17:45 |
@clarkb:matrix.org | One reason I'm asking about double checking the lock holder is it may be possible that zuul holds the lock (though that is probably unlikely) | 17:45 |
@goneri:matrix.org | We've got 3 node requests like that because of this 1 minute outage. | 17:46 |
@clarkb:matrix.org | right but we should confirm it is nodepool that holds the lock before making a change like the one in your diff (I think the change in your diff is probably safe either way but we may not be looking at the right spot if the lock holder is zuul) | 17:47 |
@goneri:matrix.org | How can I know who holds the lock? | 17:49 |
@clarkb:matrix.org | for that I think you need to use the four letter commands. You can run `dump` to get a listing of all the ephemeral nodes and their associated sessions (this should include the lock). Then you can run `cons` to get all the connections mapped to sessions. | 17:50 |
@goneri:matrix.org | "/nodepool/requests/300-0000678000" : [ 72408799404359680 ], | 17:52 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 817126: Move pipeline state to /zuul/tenant https://review.opendev.org/c/zuul/zuul/+/817126 | 17:52 | |
@clarkb:matrix.org | Gonéri: I think the lock path is something like /nodepool/requests/foo-bar/lock? | 17:52 |
@goneri:matrix.org | this is nl01, my nodepool node. | 17:53 |
@clarkb:matrix.org | for the /lock path too? | 17:53 |
@goneri:matrix.org | For the following lock: /nodepool/requests/300-0000678000 | 17:54 |
@clarkb:matrix.org | hrm I don't think that is the lock that is just the request? | 17:54 |
@clarkb:matrix.org | then there is a separate `/nodepool/requests/300-0000678000/lock` path? But maybe we don't lock that way for node requests | 17:55 |
@goneri:matrix.org | ``` | 17:56 |
ls /nodepool/requests/300-0000678000 | ||
[] | ||
``` | ||
@jim:acmegating.com | Clark: regarding the issue of the estimated time not showing up; we have a thread doing that in the background. we prime it when we get the node request back, and we get the actual value when the job starts. but that can happen on two different schedulers now, so i think that needs some re-thinking. | 17:56 |
@clarkb:matrix.org | Gonéri: interesting, I wonder if that means the lock really did go away? when you say the lock remains above are you talking about `/nodepool/requests/300-0000678000` ? | 17:57 |
@clarkb:matrix.org | in that case I would agree that the request remains and that your fix seems like a possible fix (however, I would've expect the launcher to simply try handling the request again since the previous handler failed) | 17:58 |
@clarkb:matrix.org | corvus: oh neat. I'm glad this early running in OpenDev has been so productive :) | 17:58 |
@goneri:matrix.org | I assume the node_request was locked because of the `Request is locked by someone else` messages. that's all :-). | 17:59 |
@clarkb:matrix.org | aha | 18:01 |
@clarkb:matrix.org | `/nodepool/requests-lock/300-0000678000` is the path. Can you check who holds that? | 18:02 |
@goneri:matrix.org | sure | 18:04 |
@goneri:matrix.org | Well, the lock exists but nothing holds it. | 18:06 |
-@gerrit:opendev.org- Clint Byrum proposed: [zuul/zuul-jobs] 817291: Remove google_sudoers in revoke-sudo https://review.opendev.org/c/zuul/zuul-jobs/+/817291 | 18:10 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 817292: Make time estimation synchronous https://review.opendev.org/c/zuul/zuul/+/817292 | 18:11 | |
@clarkb:matrix.org | Gonéri: in your dump output that path should show up with a session taht should show you who holds it | 18:11 |
@jim:acmegating.com | Clark, tobiash: we expected https://review.opendev.org/c/zuul/zuul/+/817292 i think the time has come | 18:12 |
@goneri:matrix.org | I did curl http://localhost:8080/commands/watches_by_path. I can see all the locks and they are all in /nodepool/nodes/*/lock. | 18:13 |
@clarkb:matrix.org | Gonéri: I don't know that nodepool watches the requests path. I think it may iterate through it | 18:14 |
@goneri:matrix.org | https://opendev.org/zuul/nodepool/src/branch/master/nodepool/launcher.py#L181 it looks like to me it uses the node request directly. | 18:17 |
@clarkb:matrix.org | Gonéri: does /nodepool/requests-lock/300-0000678000 show up in the dump output? | 18:19 |
@clarkb:matrix.org | that will show you which session has the lock | 18:19 |
@goneri:matrix.org | the entry exists if this is the question. However, it's not locked. | 18:21 |
@goneri:matrix.org | actually, when I do a `ls /nodepool/requests-lock`, I see some very old entries. Looks like nothing clean them up. | 18:22 |
@clarkb:matrix.org | well they should be ephemeral. Looks like they use the base kazoo Lock recipe | 18:23 |
@goneri:matrix.org | 537 entries! | 18:23 |
@clarkb:matrix.org | ya confirmed https://kazoo.readthedocs.io/en/latest/api/recipe/lock.html#kazoo.recipe.lock.Lock.acquire shows the default is ephemeral and nodepool doesn't seem to override that in any of its acquire calls | 18:24 |
@clarkb:matrix.org | I wonder if zookeeper didn't consider the connection to be lost | 18:25 |
@clarkb:matrix.org | and kazoo is relying on zookeeper to clean those up in a connection loss state. | 18:25 |
@clarkb:matrix.org | maybe check your zookeeper logs and ss/netstat to see if the connection was ever recorded as lost? maybe it is still there? | 18:26 |
@goneri:matrix.org | I don't think this is what cause our problem. reg here https://opendev.org/zuul/nodepool/src/branch/master/nodepool/launcher.py#L181 comes from https://opendev.org/zuul/nodepool/src/branch/master/nodepool/zk.py#L1742. It's the regular path pointing on /nodepool/requests/300-0000678000. | 18:29 |
@goneri:matrix.org | and this is the req nodepool tries to lock here: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/launcher.py#L181 | 18:30 |
@clarkb:matrix.org | Gonéri: the 'Request is locked' messages that you are getting are from nodepool trying to process the request. It would be able to process it if it wasn't locked | 18:30 |
@clarkb:matrix.org | The way a disconnect should be handled if I understand correctly is that the requests will be unlocked and then when reconnection happens nodepool can reprocess them | 18:31 |
@clarkb:matrix.org | but this is failing due to the lock | 18:31 |
@clarkb:matrix.org | If we understand why the lock doesnt' go away then wecan fix this | 18:31 |
@goneri:matrix.org | I mean, I think there is two different paths. And the nodepool uses in this block is /nodepool/requests/300-0000678000 | 18:31 |
@goneri:matrix.org | * I mean, I think there is two different paths. And the one nodepool uses in this block is /nodepool/requests/300-0000678000 | 18:31 |
@clarkb:matrix.org | there are two different paths. `/nodepool/requests/300-0000678000` is where the request details are written. How many nodes, what type of nodes etc. Then `/nodepool/requests-lock/300-0000678000` is used to lock the first path so that only one launcher or zuul are modifying it at a time | 18:33 |
@clarkb:matrix.org | The recovery process is failing beacuse somehow that lock remains in place. If the lock goes away then recovery should proceed (and this is the anticipated behavior) | 18:33 |
@clarkb:matrix.org | I wonder if the disconnection doesn't kill the session somehow and that is what keeps the lock in place but the code that noticed the disconnect has bailed out so cannot process the request further | 18:34 |
@clarkb:matrix.org | corvus: ^ do you know if you can disconnect without losing the session? | 18:34 |
@jim:acmegating.com | Clark: a session is lost on disconnect | 18:35 |
@goneri:matrix.org | oh I see, I didn't realize lockNodeRequest uses a different lock path internally | 18:35 |
@goneri:matrix.org | thank you for your patience :-) | 18:36 |
@clarkb:matrix.org | > <@jim:acmegating.com> Clark: a session is lost on disconnect | 18:37 |
That has me thinking that zookeeper didn't notice the disconnect somehow and only kazoo seemed to notice | ||
@clarkb:matrix.org | because it does seem that we rely on the zookeeper side of things to clean up ephemeral nodes | 18:38 |
@jim:acmegating.com | that seems unlikely to be the issue. i think we'd need some solid proof for that. | 18:39 |
@clarkb:matrix.org | I'm going to have to stop looking at this momentarily to prep for the opendev team meeting. | 18:39 |
@goneri:matrix.org | So indeed, there is a /nodepool/requests-lock/300-0000678000/380fe6217cb947c88c3ac7b6c2c008d7__lock__0000000000 lock in the dump output and it's associated with nl01 (same session 72408799404359680). | 18:41 |
@clarkb:matrix.org | ok good that confirms zuul doesn't have the lock | 18:44 |
@clarkb:matrix.org | which means the problem is independent of zuul | 18:44 |
@clarkb:matrix.org | But I must context switch now | 18:44 |
@clarkb:matrix.org | corvus: I'll try to review the time db change after the opendev meeting | 18:46 |
@jim:acmegating.com | Clark: thx; btw i've been going through the #sos stack to do the zuul-web stuff, i think it's 99% there and ready for other reviewers | 18:47 |
@clarkb:matrix.org | corvus: noted | 18:49 |
@jim:acmegating.com | mhu: i think we could use your input on https://review.opendev.org/817172 | 19:00 |
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of Felix Edel: | 19:10 | |
- [zuul/zuul] 817159: Implement job endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817159 | ||
- [zuul/zuul] 817160: Implement project endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817160 | ||
- [zuul/zuul] 817196: Implement tenant endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817196 | ||
- [zuul/zuul] 817228: Remove gearman from zuul-web https://review.opendev.org/c/zuul/zuul/+/817228 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 19:10 | |
- [zuul/zuul] 817164: Use abide for getting project SSH keys in zuul-web https://review.opendev.org/c/zuul/zuul/+/817164 | ||
- [zuul/zuul] 817171: Check authentication directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817171 | ||
- [zuul/zuul] 817172: Create list of admin tenants directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817172 | ||
- [zuul/zuul] 817174: Use abide to get (dis-)allowed labels in zuul-web https://review.opendev.org/c/zuul/zuul/+/817174 | ||
- [zuul/zuul] 817177: Create list of connections directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/817177 | ||
- [zuul/zuul] 817193: Use config errors directly from layout in zuul-web https://review.opendev.org/c/zuul/zuul/+/817193 | ||
@westphahl:matrix.org | corvus: responded to your comment in 817003. I won't have the time to fix this today, so feel free to ninja fix if you find the time | 19:17 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 817298: Add tag to tag serialization https://review.opendev.org/c/zuul/zuul/+/817298 | 19:18 | |
@jim:acmegating.com | swest: ack thx. | 19:19 |
@jim:acmegating.com | Clark: ^ that change should fix the tag issue mentioned in #opendev | 19:19 |
@clarkb:matrix.org | sounds like my hunch was a good one | 19:27 |
@clarkb:matrix.org | corvus: question on https://review.opendev.org/c/zuul/zuul/+/817292 but +2'd as it isn't very important and can be cleaned up in a followup | 20:05 |
@jim:acmegating.com | tristanC: done :) | 20:08 |
@jim:acmegating.com | Clark: do you think https://review.opendev.org/817298 is sufficiently reviewed to approve? (cc: zuul-maint) I think that might be a good candidate to get in now, then do an opendev clear zk data and restart this afternoon/evening. | 20:10 |
@jim:acmegating.com | that would get the tag fix as well as the pipeline status relocation | 20:11 |
@jim:acmegating.com | * that would get the tag fix as well as the pipeline relocation | 20:11 |
@clarkb:matrix.org | corvus: ++ not sure if we want to try and track down the nodeset already defined issue before restarting though? | 20:24 |
@jpew:matrix.org | I have two jobs: "build" and "test". "build" builds then "provides" build artifacts and "test" then "requires" those artifacts and tests them for $REASONS, "test" loops over all provided artifacts and will test any found in a loop. However, I've noticed that when changes start stacking, later "test" jobs will get artifacts from the previouly stacked patchesets (we are using Gerrit BTW), meaning in runs the tests multiple times: once for the current patch and once for each patchset the current one was stacked on top of..... I don't really want that latter part to happen (although I assume there is some mindblowing reason why Zuul does it :) ) | 20:25 |
@jpew:matrix.org | * I have two jobs: "build" and "test". "build" builds then "provides" build artifacts and "test" then "requires" those artifacts and tests them. For $REASONS, "test" loops over all provided artifacts and will test any found in a loop. However, I've noticed that when changes start stacking, later "test" jobs will get artifacts from the previouly stacked patchesets (we are using Gerrit BTW), meaning in runs the tests multiple times: once for the current patch and once for each patchset the current one was stacked on top of..... I don't really want that latter part to happen (although I assume there is some mindblowing reason why Zuul does it :) ) | 20:26 |
@jim:acmegating.com | Clark: i'm thinking let's get 298 in the pipe then see where we are with the other thing | 20:26 |
@jpew:matrix.org | TL; DR: I want my test job to only test things from the current patchset | 20:26 |
@jim:acmegating.com | jpew: let's clarify do you really mean you want test only considering artifacts from the current queue item, not anything from changes it depends on? or do you just mean that you only want to use the "latest" of any particular artifact? | 20:27 |
@jpew:matrix.org | corvus: The former | 20:29 |
@jim:acmegating.com | jpew: does the build job by any chance take into account in some way (either via git checkouts or artifact reuse) the changes ahead of it? | 20:29 |
@jpew:matrix.org | corvus: I'm not specfically sure what you are asking, but it doesn't require anything from zuuls perseptive.... there is a lot of caching between builds (think ccache), but they all start from scratch | 20:31 |
@jim:acmegating.com | jpew: given change A, and change B after it in the pipeline, when the 'build' job for change B runs, would it produce anything different if change A wasn't there? | 20:32 |
@jim:acmegating.com | jpew: anyway, if the answer to that is 'no', then it's a little bit of a warning flag that you may want to make sure you're getting the most out of gating. :) to directly answer your question, you can drop provides/requires from the test job, since provides/requires is used for making a linkage between queue items, not between jobs within a single queue item. the latter is done via 'dependencies'. | 20:37 |
@jpew:matrix.org | corvus: I think the answer is yes, but my mind is broken trying to think how it could possibly be no... | 20:38 |
@jim:acmegating.com | jpew: for a similar example in zuul's repo, the "zuul-build-image" or "zuul-upload-image" (they are effectively the same job, just tweaked for different pipelines) is your 'build' job, and the "zuul-quick-start" is your test job. but zuul-(build|upload)-image uses artifacts from changes ahead of it in the queue. so it has provides/requires to make sure it gets those. zuul-quick-start just has dependencies to make sure that it runs after the build job. | 20:39 |
@jim:acmegating.com | https://opendev.org/zuul/zuul/src/branch/master/.zuul.yaml | 20:39 |
@jpew:matrix.org | @corvus: Ah! Ok I have the dependencies, but I couldn't figure out how to get the artifacts from the "build" job to the "test" job, so I added the provides/requires | 20:40 |
@jim:acmegating.com | (* white lie: it also has a 'requires' line because it requires nodepool container images which are not built by the zuul-build-image job, so it actually does both things) | 20:40 |
@clarkb:matrix.org | > <@jim:acmegating.com> Clark: i'm thinking let's get 298 in the pipe then see where we are with the other thing | 20:40 |
Ok, my only worry with this is that the zk clearing will likely "fix" the nodeset issue and we have no way of knowing if we'll see it again soon | ||
@jpew:matrix.org | I suppose the thing to do there is make the "build" and "test" job agree where the artifacts are stashed? | 20:42 |
@jim:acmegating.com | jpew: i thought dependencies also pass artifacts. but that is also an option. that's effectively what's happening in the zuul jobs -- they agree that the artifacts are in the buildset container registry. | 20:43 |
@jim:acmegating.com | Clark: ok, let's dig into that and keep it in mind | 20:44 |
@mhuin:matrix.org | > <@jim:acmegating.com> mhu: i think we could use your input on https://review.opendev.org/817172 | 20:59 |
I just reviewed, I think this is indeed a bug. Auth override in the token was tested only when the override is disabled in zuul's conf, from what I can tell | ||
@jim:acmegating.com | mhu: should we return a list of all tenants in that case? | 21:01 |
@mhuin:matrix.org | > <@jim:acmegating.com> mhu: should we return a list of all tenants in that case? | 21:04 |
No, the idea was that an operator would generate a token scoped to one or several tenants with the `zuul create-auth-token` command. The claim zuul.admin would then hold the list of tenants on which the token holder would be admin | ||
@jim:acmegating.com | mhu: oh got it. and i see your comment with the suggested fix. that all makes sense, thx. :) | 21:05 |
@mhuin:matrix.org | np! I'm just facepalming because I forgot to add a test for this obvious use case | 21:05 |
@mhuin:matrix.org | I only tested when the override is forbidden | 21:06 |
-@gerrit:opendev.org- Clark Boylan proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 817292: Make time estimation synchronous https://review.opendev.org/c/zuul/zuul/+/817292 | 21:07 | |
@clarkb:matrix.org | That is a just a linter fix | 21:07 |
@goneri:matrix.org | Clark: I just did a restart of nodepool-launcher and I confirm it fixes the problem. | 21:29 |
@clarkb:matrix.org | ok we know then that removing the lock allows it to recover gracefully as expected. That means we should probably track down why the lock didn't get removed to fix this going forward | 21:30 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 817298: Add tag to tag serialization https://review.opendev.org/c/zuul/zuul/+/817298 | 22:07 | |
@goneri:matrix.org | Clark: I think https://review.opendev.org/c/zuul/nodepool/+/817287 would be enough to avoid the situation. I'm not sure yet how to test it. | 22:08 |
@clarkb:matrix.org | it might but it doesn't really explain why it broke in the first place which bothers me :) | 22:19 |
@clarkb:matrix.org | we need https://review.opendev.org/c/zuul/nodepool/+/817114 to land before other nodepool fixes | 22:38 |
@clarkb:matrix.org | it fixes the boto deps situation | 22:38 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 817292: Make time estimation synchronous https://review.opendev.org/c/zuul/zuul/+/817292 | 22:39 | |
@clarkb:matrix.org | hrm actually they made a new release and the pip dep solver might be able to solve it. But giving the dep solver help might also make installations quicker. Just less urgent now | 22:40 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!