Tuesday, 2022-11-08

-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 863810: [wip] enable-kubernetes: check pod is actually running https://review.opendev.org/c/zuul/zuul-jobs/+/86381000:13
@jim:acmegating.comianw: i think the final version of that check may need some retries (ie, if 5s isn't enough for it to start the pod).  maybe you could make the status check an "until"?00:15
@iwienand:matrix.orgyep could be true -- so far it hasn't triggered due to my bugs :)  00:16
@iwienand:matrix.orgi held a node and did get it "running" ```test   1/1     Running   0          14m```00:17
@jim:acmegating.comyeah, and may even work now-ish -- just it's probably racy if it does00:17
@iwienand:matrix.orgi think that the problem *may* be that on jammy the containertools-networking-plugin is system packaged, while on other distros it isn't?  and thus the crio.conf file needs an update to point to the packaged tools, as it defaults to somewhere in /opt00:18
@iwienand:matrix.org... or at least *a* problem ...00:18
@iwienand:matrix.orgzuul-jobs-test-ensure-kubernetes-crio-ubuntu-jammy (https://zuul.opendev.org/t/zuul/build/7dd2e3ad1e2e423dbeb8840ad57cd724) seems to pass with basically that update.  but the docker one didn't00:22
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 863810: [wip] enable-kubernetes: check pod is actually running https://review.opendev.org/c/zuul/zuul-jobs/+/86381000:47
@iwienand:matrix.orgok, i think the docker bits come down to roughly the same issue03:10
@iwienand:matrix.orgthe cri-dockerd shim is started with ```/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=cni --cni-bin-dir=/opt/cni/bin```03:10
@iwienand:matrix.orgi can not for the life of me figure out where --cni-bin-dir is being set for that03:10
@iwienand:matrix.orghttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-kubernetes/tasks/minikube.yaml#L87 is where we set it up03:12
@iwienand:matrix.orgthose command-line args aren't set by https://raw.githubusercontent.com/Mirantis/cri-dockerd/v0.2.6/packaging/systemd/cri-docker.service03:13
@iwienand:matrix.orgis it re-exec()ing itself or something?03:14
@michael_kelly_anet:matrix.org```03:30
Ansible output: b'Exception in thread Thread-2 (_read_log):'
Ansible output: b'Traceback (most recent call last):'
Ansible output: b' File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner'
Ansible output: b' self.run()'
Ansible output: b' File "/usr/local/lib/python3.10/threading.py", line 953, in run'
Ansible output: b' self._target(*self._args, **self._kwargs)'
Ansible output: b' File "/var/lib/zuul/ansible/6/zuul/ansible/callback/zuul_stream.py", line 202, in _read_log'
Ansible output: b' self._zuul_console_version = int(buff)'
Ansible output: b"ValueError: invalid literal for int() with base 10: ''"
Ansible output: b'skipping: [builder] => {"changed": false, "skip_reason": "Conditional result was False"}'
Ansible output: b''
```
I keep seeing this guy in the logs for my executor pod - running with a single nodepool pools with the kubernetes driver running in pod mode. This is making the console attach not work - I fixed this a month or so ago in my previous deployment, but can't recall what I did.
@michael_kelly_anet:matrix.orgAnyone seen this before?03:30
@iwienand:matrix.org> <@iwienand:matrix.org> is it re-exec()ing itself or something?03:33
ok i am not totally insane -- https://github.com/kubernetes/minikube/blob/1ade1e23997b98227a436906ff3183247e79422b/pkg/minikube/cruntime/docker.go#L675 ... minikube is overwriting the service file :/
@iwienand:matrix.orgMichael Kelly: hrm, interesting.  the remote side should be sending a version request ... it looks like it's sending a null?03:34
@michael_kelly_anet:matrix.orgYea, there's something missing on the pod side.  That much I remember.03:35
@michael_kelly_anet:matrix.orgJust scratching my head thinking about what.03:35
@michael_kelly_anet:matrix.orgThe process whereby I deduced the problem last time out is not apparent to me.03:35
@michael_kelly_anet:matrix.orgIs there a list of base expectations for the nodes enumerated somewhere?03:35
@iwienand:matrix.orgthis sits in a loop waiting for a connection from the remote side -> https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/callback/zuul_stream.py#L17903:36
@michael_kelly_anet:matrix.orgSorry, that's the executor side you mean?03:36
@michael_kelly_anet:matrix.orgMostly I was hoping that the log snippet might seem familiar :)03:37
@iwienand:matrix.orgthe mecahnism for this is described @ https://zuul-ci.org/docs/zuul/latest/developer/ansible.html#capturing-live-command-output03:38
@michael_kelly_anet:matrix.orgto clarify: zuul-console is what the pod is emitting things back to?03:41
@michael_kelly_anet:matrix.orgzuul_console, rather03:42
@iwienand:matrix.orgso what happens is that we start executing a command on the remote node, and the executor then sends to the zuul_console listening on port 19885 a request to stream back the live logs of that command03:42
@iwienand:matrix.orgso what this is saying, is, I think, the executor is sending something to the node that is running on port 19885 and it is getting back a blank response03:43
@michael_kelly_anet:matrix.orgGotcha, so we should be spinning up the zuul_console daemon on the remote node03:43
@iwienand:matrix.orgyes, starting zuul_console: on the remote node will start this listening on 19885.  however, the fact that it dies like this is probably a bug03:44
@iwienand:matrix.orgwe really assume that we will get back either '[Zuul] Log not found' or a correct version response at https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/callback/zuul_stream.py#L19603:44
@michael_kelly_anet:matrix.orgGotcha.03:44
@michael_kelly_anet:matrix.orgMy recollection is that there was <something> missing on the base image for the node.03:45
@iwienand:matrix.orgif we get anything else, we should probably emit something more useful03:45
@iwienand:matrix.orgit's interesting though that it could connect() ... but then doesn't get back the right response ... is something else listening on that port; or a firewall thing?03:46
@iwienand:matrix.orgif zuul_console isn't running, i'd expect it would just fail to start @ https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/callback/zuul_stream.py#L18003:46
@michael_kelly_anet:matrix.orgYea, I don't believe so.  I think I need to get my log collector up and going again and see if there's something obvious in there.03:48
@michael_kelly_anet:matrix.orgI think that was my order of operations last time.03:48
@michael_kelly_anet:matrix.orgI really need to be in the habit of writing these things out more as I go.03:48
@iwienand:matrix.orgwhat happens if you telnet <remote node> 19885 from the executor?03:49
@michael_kelly_anet:matrix.orgShould the zuul_console be running all the time?03:51
@iwienand:matrix.org> <@michael_kelly_anet:matrix.org> Should the zuul_console be running all the time?03:55
it basically hangs around for the lifespan of the node. if it's not started, zuul_console will daemonize it. on a static node, it will just then sit there
@michael_kelly_anet:matrix.orgDefinitely not running in the pod :)03:56
@g_gobi:matrix.orgHi,04:15
https://zuul-ci.org/docs/zuul/latest/config/project.html#attr-project.%3Cpipeline%3E.fail-fast
I have a doubt about this one. Whether it fill abort the running builds from current buildset or all the running buildsets?
@iwienand:matrix.org> <@iwienand:matrix.org> ok i am not totally insane -- https://github.com/kubernetes/minikube/blob/1ade1e23997b98227a436906ff3183247e79422b/pkg/minikube/cruntime/docker.go#L675 ... minikube is overwriting the service file :/04:15
i have filed an issue on this https://github.com/kubernetes/minikube/issues/15320
@g_gobi:matrix.org * Hi,04:19
https://zuul-ci.org/docs/zuul/latest/config/project.html#attr-project.%3Cpipeline%3E.fail-fast
I have a doubt about this one. Whether it will abort the running builds from the current build-set or all the running build-sets?
Also, what about `reporting: false` jobs? Whether it will abort the builds when non-reporting jobs?
@michael_kelly_anet:matrix.orgianw: I'm guessing that ```2022-11-08 04:56:00,664 DEBUG zuul.ExecutorServer:   E1108 04:55:16.484870   76275 portforward.go:331] an error occurred forwarding 35761 -> 19885: error forwarding port 19885 to pod cfec3d357cdeaea641ea4295d6fa9b27aa2e6dfba6bc1e7021e64f585301b137, uid : exit status 1: 2022/11/08 04:55:16 socat[33617] E connect(5, AF=2 127.0.0.1:19885, 16): Connection refused04:57
``` is my issue
@michael_kelly_anet:matrix.org * ianw: I'm guessing that ```2022-11-08 04:56:00,664 DEBUG zuul.ExecutorServer:   E1108 04:55:16.484870   76275 portforward.go:331\] an error occurred forwarding 35761 -> 19885: error forwarding port 19885 to pod cfec3d357cdeaea641ea4295d6fa9b27aa2e6dfba6bc1e7021e64f585301b137, uid : exit status 1: 2022/11/08 04:55:16 socat\[33617\] E connect(5, AF=2 127.0.0.1:19885, 16): Connection refused04:58
``` related to my issue
@michael_kelly_anet:matrix.org * ianw: I'm guessing that04:58
```2022-11-08 04:56:00,664 DEBUG zuul.ExecutorServer: E1108 04:55:16.484870 76275 portforward.go:331\] an error occurred forwarding 35761 -> 19885: error forwarding port 19885 to pod cfec3d357cdeaea641ea4295d6fa9b27aa2e6dfba6bc1e7021e64f585301b137, uid : exit status 1: 2022/11/08 04:55:16 socat\[33617\] E connect(5, AF=2 127.0.0.1:19885, 16): Connection refused
```
is related to my issue
@michael_kelly_anet:matrix.org * ianw: I'm guessing that04:59
```
2022-11-08 04:56:00,664 DEBUG zuul.ExecutorServer: E1108 04:55:16.484870 76275 portforward.go:331] an error occurred forwarding 35761 -> 19885: error forwarding port 19885 to pod cfec3d357cdeaea641ea4295d6fa9b27aa2e6dfba6bc1e7021e64f585301b137, uid : exit status 1: 2022/11/08 04:55:16 socat[33617] E connect(5, AF=2 127.0.0.1:19885, 16): Connection refused
```
is related to my issue
@michael_kelly_anet:matrix.orgAt least it implies that there's nothing alive on the other end05:00
@iwienand:matrix.org... interesting05:09
@iwienand:matrix.orgso i wonder if it's accepting the connection, trying to forward and fail ... but it still seems odd the other side gets a connect() that works05:10
@michael_kelly_anet:matrix.orgMy recollection is that kubectl port-forward might puke back at you if you don't have an actual port open there05:11
@g_gobi:matrix.orgAny idea about this https://matrix.to/#/!yuuvjJSOEGSfTzxOjK:opendev.org/$3BhQka4pKfo1LdEO2Pb3AA0DmdooF-ckqiJNjAmzips?via=matrix.org&via=fricklercloud.de&via=opendev.org?05:14
@iwienand:matrix.orgtdlaw: i don't really know, but https://opendev.org/zuul/zuul/src/branch/master/zuul/manager/__init__.py#L1843 -> https://opendev.org/zuul/zuul/src/branch/master/zuul/manager/__init__.py#L1787 which walks the buildset.05:17
@iwienand:matrix.orgso i would say the buildset is what that means.  i'm sure a small clarifaction to the docs would be welcome :)05:18
@g_gobi:matrix.orgThanks ianw: 05:24
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [zuul/zuul-jobs] 864004: pypi-upload: support twine --skip-existing option https://review.opendev.org/c/zuul/zuul-jobs/+/86400413:16
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [zuul/zuul-jobs] 864004: pypi-upload: support twine --skip-existing option https://review.opendev.org/c/zuul/zuul-jobs/+/86400413:21
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [zuul/zuul-jobs] 864004: pypi-upload: support twine --skip-existing option https://review.opendev.org/c/zuul/zuul-jobs/+/86400413:26
@q:fricklercloud.denodepool CI seems broken, failures on https://review.opendev.org/c/zuul/nodepool/+/861947 and https://review.opendev.org/c/zuul/nodepool/+/863812 look the same and the latter doesn't touch aws at all13:58
@fungicide:matrix.org> <@q:fricklercloud.de> nodepool CI seems broken, failures on https://review.opendev.org/c/zuul/nodepool/+/861947 and https://review.opendev.org/c/zuul/nodepool/+/863812 look the same and the latter doesn't touch aws at all15:09
looks like it began sometime between 2022-10-31 (last successful build) and 2022-11-03 (first failed build)
@fungicide:matrix.orgof the failing unit tests in those builds, one of the tests isn't related to the aws driver from what i can tell, so the problem may run deeper15:12
@fungicide:matrix.orglooks like boto3 1.26.0 was released in that timeframe, and the aws driver backtraces are bubbling up from within boto: https://github.com/boto/boto3/blob/develop/CHANGELOG.rst15:19
@jim:acmegating.comfungi: i believe that moto has added instance profile support, something that they had not previously mocked before, so our tests can/should increase fidelity to use that.  i think i know what needs to be done and can take care of it.15:20
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed wip: [zuul/nodepool] 864015: DNM: pin boto3 to pinpoint recent test failures https://review.opendev.org/c/zuul/nodepool/+/86401515:21
@fungicide:matrix.orgcorvus: if you do, that beats me fumbling around ;)15:23
@fungicide:matrix.orgi'm mostly curious at this point whether pinning boto3 makes the two aws driver tests pass, but also whether it makes the non-aws test failure pass as well or if we have more than one problem15:24
@jim:acmegating.comfungi: i think the issue may be moto rather than boto3 (but maybe pinning boto3 would implicitly constrain moto)15:26
@fungicide:matrix.orgyeah, it does appear that moto 4.0.9 was also release in that window15:28
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 864017: Fix AWS instance_profile tests for moto v4 https://review.opendev.org/c/zuul/nodepool/+/86401715:39
@jim:acmegating.comthat should take care of it15:39
@gervasx:matrix.orgHi all, I'm newer with Zuul and I'm trying to configure zuul bot and projects in Github. Unfortunatly I'm not able to configure check and gate steps for protected branches. Do you know if there are some specific steps needed or some guide-lines available to set up the environment?16:17
@clarkb:matrix.org> <@gervasx:matrix.org> Hi all, I'm newer with Zuul and I'm trying to configure zuul bot and projects in Github. Unfortunatly I'm not able to configure check and gate steps for protected branches. Do you know if there are some specific steps needed or some guide-lines available to set up the environment?16:22
I'm not super familiar with how people set up Zuul with Github (we use Gerrit), but it might help if you can provider more specifics on what exactly is failing?
@jim:acmegating.comgervas: there is a configurator on the acme gating site that may help you ensure you got all the steps: https://acmegating.com/acme-enterprise-zuul/#start16:24
@jim:acmegating.comgervas: it's based on the documentation here: https://zuul-ci.org/docs/zuul/latest/drivers/github.html16:25
@jim:acmegating.comgervas: and this is a relevant setting worth reading up on: https://zuul-ci.org/docs/zuul/latest/tenants.html#attr-tenant.exclude-unprotected-branches16:27
@clarkb:matrix.org> <@gervasx:matrix.org> Hi all, I'm newer with Zuul and I'm trying to configure zuul bot and projects in Github. Unfortunatly I'm not able to configure check and gate steps for protected branches. Do you know if there are some specific steps needed or some guide-lines available to set up the environment?16:27
* I'm not super familiar with how people set up Zuul with Github (we use Gerrit), but it might help if you can provide more specifics on what exactly is failing?
@gervasx:matrix.orgThanks for the information corvus . I already performed this steps but in the protected branch no configuration are present. Maybe I'm missing something, I don't know..16:38
@jim:acmegating.comgervas: one thing to try: do a `zuul-scheduler full-reconfigure` on the scheduler.  it's possible for zuul to miss the transition event from unprotected<->protected (it usually will catch up on a busy system, but a good idea to run that to eliminate doubt).16:40
@jim:acmegating.comgervas: check the scheduler debug logs (run the scheduler with debug logging) and see if it says anything about what branches it's loading config from and why16:40
@gervasx:matrix.orgThanks for the help. I will try this solution16:53
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 864017: Fix AWS instance_profile tests for moto v4 https://review.opendev.org/c/zuul/nodepool/+/86401718:53
@iwienand:matrix.orgI *think* I have something that will make zuul-jobs-test-ensure-kubernetes-docker-ubuntu-jammy work.  But before I go too far -- do we think it is worth keeping the ensure-kubernetes with docker-runtime path up?  this might have made more sense when docker was a native runtime, but now you need the cri-dockerd daemon inbetween the whole thing feels a lot more fragile.  is this something people are actively interested in?  it seems the idea is more to be able to bring up a CI kubernetes, but the actual runtime seems fairly unimportant19:24
@jim:acmegating.comianw: something to check is whether that would cause us not to test the buildset registry setup with k8s+docker.  in other words, if we drop role support for that, are we dropping support for buildset-registry-based speculative container execution.19:35
@jim:acmegating.com * ianw: something to check is whether that would cause us not to test the buildset registry setup with k8s+docker.  in other words, if we drop role support for that, are we dropping support for buildset-registry-based speculative container execution?19:35
@jim:acmegating.com * ianw: something to check is whether that would cause us not to test the buildset registry setup with k8s+docker.  in other words, if we drop role support for that, are we dropping support for buildset-registry-based speculative container execution *with docker*?  (sorry, important edit: "with docker")19:36
@jim:acmegating.comianw: i haven't dug into it, so i don't know if the cri-dockerd configuration speculative container path would go through the "docker" or "crio" path... i would assume docker, which i think means we would need to be able to set that up in order to test that path.19:37
@jim:acmegating.combut that's a whole lot of assuming :|19:37
@iwienand:matrix.org... i will have to research that.  essentially i guess kubernetes is telling the kublet "pull this image", which the kublet says to the runtime via CRI "get this image" and the question is, what then actually pulls the image ... docker or <something??> ?19:43
-@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [zuul/zuul-jobs] 864004: pypi-upload: support twine --skip-existing option https://review.opendev.org/c/zuul/zuul-jobs/+/86400420:04
-@gerrit:opendev.org- Michael Kelly proposed:21:36
- [zuul/zuul-operator] 863572: bug: Properly parameterize zookeeper-client-tls everywhere https://review.opendev.org/c/zuul/zuul-operator/+/863572
- [zuul/zuul-operator] 863439: doc: Re-write install doc to use helm chart https://review.opendev.org/c/zuul/zuul-operator/+/863439
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 863810: [wip] enable-kubernetes: check pod is actually running https://review.opendev.org/c/zuul/zuul-jobs/+/86381022:37
-@gerrit:opendev.org- Michael Kelly proposed: [zuul/zuul-jobs] 861799: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/86179922:39
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 863810: [wip] enable-kubernetes: check pod is actually running https://review.opendev.org/c/zuul/zuul-jobs/+/86381022:46
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 863810: [wip] enable-kubernetes: check pod is actually running https://review.opendev.org/c/zuul/zuul-jobs/+/86381023:05
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:23:12
- [zuul/zuul] 861495: Parallelize some pipeline refresh ops https://review.opendev.org/c/zuul/zuul/+/861495
- [zuul/zuul] 864069: Fix race in merger shutdown https://review.opendev.org/c/zuul/zuul/+/864069
@clarkb:matrix.orgianw: re k8s stuff can we just switch to the openbuild packages as a quick fix? (I hvane't kept up too closely but did read the issue you filed)23:15
@clarkb:matrix.orgthose packages do exist now aiui23:15
@jim:acmegating.comClark: 864069 *may* (i strongly emphasize the *may* be related to the timeouts you were looking at).  similar presentation at least.23:16
@jim:acmegating.com * Clark: 864069 _may_ (i strongly emphasize the _may_) be related to the timeouts you were looking at.  similar presentation at least.23:16
@clarkb:matrix.orgcorvus: it can't hurt to fix a race and see if other things get better :)23:16
@iwienand:matrix.orgClark: it looks like the workaround of overriding the override will work to keep cri-docker going another day ... is open build like ppa's for suse?23:17
@clarkb:matrix.orgianw: ya OBS is open build service which is basically PPAs but by suse23:19
@clarkb:matrix.orgyou can run the service yourself too if you like. Not sure if anyone has ever done that with lp PPAs23:20
@clarkb:matrix.orgcorvus: out of curiosity any reason to stack that race fix on top of those other changes? I'll work on reviewing the stack shortly23:20
@iwienand:matrix.orgyeah, so i think the situation is the kubic repos have stopped building things like the containernetwork-plugins, which are now in ubuntu upstream.  but they use different paths, and the cri-docker shim is assuming the kubic package paths.  at least i think that's how it got into this situation; it all seems very mixed up23:21
@jim:acmegating.comClark: the timing changes in 861495 triggered it fairly reliably on my system (which is how i was able to track it down); and i didn't want to rebase the parents.  i can move it out and depends-on if you want to try to fast-track it.23:22
@clarkb:matrix.orgianw: I think the ubuntu packages on obs have lagged the ubuntu releases but are there now? I could be mistaken though23:22
@clarkb:matrix.orgcorvus: no I think thats fine. I was mostly curious if there was a specific reason for that. Sounds like reproduceability which is a good reason23:22
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 863810: [wip] enable-kubernetes: check pod is actually running https://review.opendev.org/c/zuul/zuul-jobs/+/86381023:25
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 863810: enable-kubernetes: Fix jammy install, improve pod test https://review.opendev.org/c/zuul/zuul-jobs/+/86381023:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!