-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 858726: Fix CORS and endpoint in AWS log upload https://review.opendev.org/c/zuul/zuul-jobs/+/858726 | 00:15 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 858726: Fix CORS and endpoint in AWS log upload https://review.opendev.org/c/zuul/zuul-jobs/+/858726 | 00:30 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 859013: Azure: handle missing Zone https://review.opendev.org/c/zuul/nodepool/+/859013 | 03:46 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 855096: Tracing: implement span save/restore https://review.opendev.org/c/zuul/zuul/+/855096 | 06:46 | |
-@gerrit:opendev.org- Albin Vass proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 807806: Add "slots" to static node driver https://review.opendev.org/c/zuul/nodepool/+/807806 | 07:08 | |
-@gerrit:opendev.org- Per Wiklund proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 807806: Add "slots" to static node driver https://review.opendev.org/c/zuul/nodepool/+/807806 | 07:59 | |
-@gerrit:opendev.org- Simon Westphahl proposed: | 10:09 | |
- [zuul/zuul] 859066: Link span of queue item to trigger event span https://review.opendev.org/c/zuul/zuul/+/859066 | ||
- [zuul/zuul] 859067: Trace received Github events https://review.opendev.org/c/zuul/zuul/+/859067 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 10:17 | |
- [zuul/zuul] 859066: Link span of queue item to trigger event span https://review.opendev.org/c/zuul/zuul/+/859066 | ||
- [zuul/zuul] 859067: Trace received Github events https://review.opendev.org/c/zuul/zuul/+/859067 | ||
@caiquemello:matrix.org | > <@clarkb:matrix.org> I would check that your executor is accepting jobs. It is possible it may have tripped a governor and stopped running jobs | 11:52 |
---|---|---|
Thank you Clark, I'm gonna take a look | ||
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 859013: Azure: handle missing Zone https://review.opendev.org/c/zuul/nodepool/+/859013 | 14:14 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 858726: Fix CORS and endpoint in AWS log upload https://review.opendev.org/c/zuul/zuul-jobs/+/858726 | 15:12 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 857796: Remove support for Ansible 2 https://review.opendev.org/c/zuul/zuul/+/857796 | 17:32 | |
@clarkb:matrix.org | zuulians https://review.opendev.org/c/zuul/zuul-jobs/+/858961 seems to be happy in testing. I'm happy to help keep an eye on that today if we land it today. But I'm also happy to wait a bit if we prefer (I think it will affect most zuul jobs out there) | 17:58 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 858741: Demote "Starting/Finished cleanup" log entries to debug https://review.opendev.org/c/zuul/nodepool/+/858741 | 18:48 | |
-@gerrit:opendev.org- lotorev vitaly proposed wip: [zuul/project-config] 859151: Update link to zuul gating docs https://review.opendev.org/c/zuul/project-config/+/859151 | 21:10 | |
-@gerrit:opendev.org- lotorev vitaly proposed wip: [zuul/project-config] 859151: Update link to zuul gating docs https://review.opendev.org/c/zuul/project-config/+/859151 | 21:10 | |
-@gerrit:opendev.org- lotorev vitaly marked as active: [zuul/project-config] 859151: Update link to zuul gating docs https://review.opendev.org/c/zuul/project-config/+/859151 | 21:11 | |
-@gerrit:opendev.org- lotorev vitaly proposed wip: [zuul/zuul] 859152: Update link to zuul gating docs in reference pipeline https://review.opendev.org/c/zuul/zuul/+/859152 | 21:13 | |
-@gerrit:opendev.org- lotorev vitaly proposed wip: [zuul/zuul] 859152: Update link to zuul gating docs in reference pipeline https://review.opendev.org/c/zuul/zuul/+/859152 | 21:16 | |
-@gerrit:opendev.org- lotorev vitaly marked as active: [zuul/zuul] 859152: Update link to zuul gating docs in reference pipeline https://review.opendev.org/c/zuul/zuul/+/859152 | 21:16 | |
@jim:acmegating.com | i've seen the nodepool-functional-container-openstack-release job fail twice with a timeout, and the underlying cause is an openstack error: #033[01;35m[instance: 770f1397-0550-4edc-8815-caa7f720b6d7] #033[01;31mFailed to allocate network(s)#033[00m: nova.exception.ExternalNetworkAttachForbidden: It is not allowed to create an interface on external network db590dd4-5493-4ffc-a658-13bad70e96d7 | 22:40 |
@jim:acmegating.com | https://zuul.opendev.org/t/zuul/build/74fab5fdddee4c3d88e71e40ad6795a7 is one failure | 22:40 |
@jim:acmegating.com | i do not know how to debug that, so if someone who is more knowledgeable about openstack would like to look into that, that would be great | 22:41 |
@clarkb:matrix.org | My first thought is "wow I thought that was explicitly allowed and we do that on the inmotion cloud to avoid wasting IPs for router interfaces" | 22:41 |
@jim:acmegating.com | (it's the two most recent builds that failed like that; i'm not sure if we're at 100% failure rate on that currently after a recent openstack change, or we just got lucky) | 22:41 |
@jim:acmegating.com | Clark: maybe some default openstack/devstack policy change? | 22:42 |
@jim:acmegating.com | (i'm just aping words i've heard before, i don't really know if that's a thing) | 22:42 |
@clarkb:matrix.org | The message comes from nova somewhere. Let me see if the nova git logs indicate anything useful | 22:43 |
@jim:acmegating.com | build history: https://zuul.opendev.org/t/zuul/builds?job_name=nodepool-functional-container-openstack-release&project=zuul/nodepool | 22:44 |
@jim:acmegating.com | so if it's a change, it's very recent | 22:45 |
@clarkb:matrix.org | OpenStack is doing a bunch of release stuff so it is possible | 22:45 |
@jim:acmegating.com | "Test Nodepool containers and OpenStack, with released projects" is the job description | 22:46 |
@jim:acmegating.com | so i guess a dependency release in the last 12 hours could do it? | 22:46 |
@clarkb:matrix.org | ya | 22:47 |
@clarkb:matrix.org | https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py#L603-L613 I think it may be policy related based on that | 22:47 |
@clarkb:matrix.org | (that exception's string matches the error we got) | 22:48 |
@clarkb:matrix.org | Nova itself doesn't seem to set a policy outside of its tests and that string doesn't show up in devstack | 22:49 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 859170: Make nodepool-functional-container-openstack-release non-voting https://review.opendev.org/c/zuul/nodepool/+/859170 | 22:51 | |
@jim:acmegating.com | Clark: yeah, i'm wondering how devstack is working but we aren't | 22:52 |
@clarkb:matrix.org | Also that code hasn't appeared to have changed recently in nova which makes me think it must be the policy config that is different | 22:53 |
@clarkb:matrix.org | hrm it actually looks like devstack is installing master based on the logs | 22:56 |
@clarkb:matrix.org | I think the -release in that job is for dib and glean not openstack + devstack | 22:56 |
@jim:acmegating.com | huh. maybe we can update the description when this is through | 22:57 |
@jim:acmegating.com | hrm, given that it takes 1.5 hours to time out that job, maybe we should remove it or give it a 45m timeout for now? | 23:04 |
@jim:acmegating.com | the median successful time is 36m. some successes are longer, but i think 45m would still give us data about whether the job is fixed externally without hindering ongoing nodepool maintenance | 23:05 |
@clarkb:matrix.org | that seems reasonable. Though devstack alone takes about 20 minutes iirc | 23:06 |
@clarkb:matrix.org | https://review.opendev.org/c/openstack/nova/+/849209 is suspicious but seems to have merged a month ago | 23:06 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 859170: Make nodepool-functional-container-openstack-release non-voting https://review.opendev.org/c/zuul/nodepool/+/859170 | 23:06 | |
@clarkb:matrix.org | wow there are even comments that nova shouldn't do this check .... https://review.opendev.org/c/openstack/nova/+/849209/6/nova/policies/servers.py line 315 | 23:07 |
@clarkb:matrix.org | corvus: what log file did that "it is not allowed" message come out of? | 23:08 |
@clarkb:matrix.org | I'm trying to find it in the example job you linked and having trouble | 23:08 |
@jim:acmegating.com | syslog | 23:08 |
@jim:acmegating.com | something similar was also reported to nodepool via sdk in the launcher log | 23:08 |
@clarkb:matrix.org | ya the launcher log is much more terse | 23:09 |
@clarkb:matrix.org | corvus: I think I've confirmed that both a successful and a failed job ran with the same version of nova installed: https://zuul.opendev.org/t/zuul/build/9097b115a25c4d2d96295cb1e7d301f8/log/job-output.txt#8968 vs https://zuul.opendev.org/t/zuul/build/74fab5fdddee4c3d88e71e40ad6795a7/log/job-output.txt#8858 | 23:21 |
@clarkb:matrix.org | first is success, second is failure | 23:21 |
@jim:acmegating.com | so this might be intermittent? or the cause may not be nova? | 23:22 |
@clarkb:matrix.org | ya | 23:22 |
@clarkb:matrix.org | I pinged gmann in the nova channel about it since gmann seems to be pushing the rbac stuff along and this seems very related | 23:31 |
@clarkb:matrix.org | corvus: gmann explains that a project admin is an admin so the update to the rbac policy should be equivalent. On further digging the successful jobs have the same rbac failure and tracebacks | 23:49 |
@clarkb:matrix.org | But the failed jobs also have 'nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed' | 23:49 |
@clarkb:matrix.org | I think that this may be the fatal error: https://zuul.opendev.org/t/zuul/build/74fab5fdddee4c3d88e71e40ad6795a7/log/syslog?severity=0#84665-84692 | 23:51 |
@clarkb:matrix.org | and that appears to be waiting for 5 minutes for libvirt to make a virtual interface and it isn't happening in that amount of time | 23:51 |
@clarkb:matrix.org | corvus: and I think we'd need to collect libvirt logs to see where libvirt is getting stuck | 23:53 |
@jim:acmegating.com | so 2 new things to consider: libvirt versions, and maybe opendev cloud providers? | 23:54 |
@clarkb:matrix.org | yes | 23:54 |
-@gerrit:opendev.org- Zuul merged on behalf of Dr. Jens Harbott: [zuul/zuul] 834671: Handle reviews by anonymous github users https://review.opendev.org/c/zuul/zuul/+/834671 | 23:56 | |
@jim:acmegating.com | 2 failures on on iweb and ovh | 23:56 |
@jim:acmegating.com | most recent success in iweb | 23:56 |
@jim:acmegating.com | so no smoking gun there | 23:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!