Thursday, 2020-03-26

openstackgerrit	Merged zuul/zuul master: bindep: add bzip2 to all platforms https://review.opendev.org/715041	00:18
*** zxiiro has quit IRC		00:30
*** ysandeep\|away is now known as ysandeep		00:43
*** jamesmcarthur has quit IRC		00:46
*** tosky has quit IRC		00:52
*** jamesmcarthur has joined #zuul		01:01
*** rlandy has joined #zuul		01:05
*** rlandy has quit IRC		01:07
*** jamesmcarthur has quit IRC		01:23
*** ysandeep is now known as ysandeep\|rover		01:25
*** Goneri has quit IRC		01:30
*** bhavikdbavishi has joined #zuul		02:42
*** swest has quit IRC		02:54
*** bhavikdbavishi has quit IRC		02:54
*** leathekd has quit IRC		02:57
*** swest has joined #zuul		03:10
*** jamesmcarthur has joined #zuul		03:18
*** jamesmcarthur has quit IRC		03:22
*** jamesmcarthur has joined #zuul		03:27
*** jamesmcarthur has quit IRC		03:33
*** jamesmcarthur has joined #zuul		03:34
*** bhavikdbavishi has joined #zuul		03:53
*** bhavikdbavishi1 has joined #zuul		03:56
*** bhavikdbavishi has quit IRC		03:57
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:57
*** jamesmcarthur has quit IRC		04:23
*** jamesmcarthur has joined #zuul		04:26
*** jamesmcarthur has quit IRC		04:34
*** jamesmcarthur has joined #zuul		04:34
*** jamesmcarthur has quit IRC		04:43
*** jamesmcarthur has joined #zuul		04:44
*** sgw has quit IRC		04:59
*** y2kenny has quit IRC		05:09
*** evrardjp has quit IRC		05:36
*** evrardjp has joined #zuul		05:36
*** reiterative has quit IRC		05:40
*** reiterative has joined #zuul		05:40
*** jamesmcarthur has quit IRC		05:55
*** jamesmcarthur has joined #zuul		05:57
*** jamesmcarthur has quit IRC		06:02
*** jamesmcarthur has joined #zuul		06:07
*** jamesmcarthur has quit IRC		06:30
*** dpawlik has joined #zuul		07:22
*** bhavikdbavishi has quit IRC		07:27
*** ysandeep\|rover is now known as ysandeep\|rover\|l		07:35
*** ysandeep\|rover\|l is now known as ysandeep\|roveraf		07:35
*** ysandeep\|roveraf is now known as ysandeep\|rover\|l		07:36
*** bhavikdbavishi has joined #zuul		07:54
*** evrardjp has quit IRC		07:57
*** bolg has quit IRC		07:59
*** evrardjp has joined #zuul		08:01
*** bhavikdbavishi has quit IRC		08:04
*** guilhermesp has quit IRC		08:06
*** vblando has quit IRC		08:06
*** dmellado has quit IRC		08:06
*** guilhermesp has joined #zuul		08:07
*** dmellado has joined #zuul		08:10
*** bolg has joined #zuul		08:17
*** bhavikdbavishi has joined #zuul		08:19
*** bolg has quit IRC		08:26
*** bolg has joined #zuul		08:29
*** toabctl has quit IRC		08:30
*** bolg has quit IRC		08:33
*** ysandeep\|rover\|l is now known as ysandeep\|rover		08:35
*** bolg has joined #zuul		08:35
*** jpena\|off is now known as jpena		08:53
*** tosky has joined #zuul		09:00
*** toabctl has joined #zuul		09:14
*** bhavikdbavishi has quit IRC		09:17
*** bhavikdbavishi has joined #zuul		09:31
*** wxy-xiyuan has quit IRC		09:39
*** wxy-xiyuan has joined #zuul		09:39
*** sshnaidm is now known as sshnaidm\|afk		09:56
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Install unzip on all platforms https://review.opendev.org/714919	10:16
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Update docker based test setup for zk auth https://review.opendev.org/657096	10:18
*** bhavikdbavishi has quit IRC		10:27
*** bhavikdbavishi has joined #zuul		10:28
*** bhavikdbavishi1 has joined #zuul		10:31
*** bhavikdbavishi has quit IRC		10:33
*** bhavikdbavishi1 is now known as bhavikdbavishi		10:33
*** sugaar has quit IRC		10:38
*** sugaar has joined #zuul		10:39
*** avass has quit IRC		11:14
*** sshnaidm\|afk is now known as sshnaidm		11:25
*** bhavikdbavishi has quit IRC		11:55
tristanC	when y2kenny comes back, iirc, `bwrap: Can't mount proc` can be caused by either the lack of userns, or that bwrap is not root setuid	11:57
*** jpena is now known as jpena\|lunch		11:58
*** Goneri has joined #zuul		12:14
*** rfolco has joined #zuul		12:20
*** rlandy has joined #zuul		12:23
*** bhavikdbavishi has joined #zuul		12:32
zbr	anyone aware of TypeError: 'NoneType' object is not iterable from zuul_sphinx/zuul.py", line 109, in parse_zuul_d ?	12:36
zbr	mainly it happened when I removed layout.yaml because jobs where already defined inside projecta.yaml, at https://review.rdoproject.org/r/#/c/26005/	12:38
zbr	zuul had no problem with that, but tox-docs choked.	12:38
zbr	i will propose a patch	12:41
zbr	i see that the code from master behaves differently, giving an error like: ('File %s in Zuul dir is empty', '/Users/ssbarnea/c/rdo/rdo-jobs/zuul.d/layout.yaml')	12:46
mordred	zbr: interesting. any reason to not just delete the file instead of removing the contents but leaving it there?	12:52
zbr	mordred: it was my mistake, i left the file empty.	12:53
zbr	sorted once i removed the file.	12:54
zbr	but it would be a good idea to make a new release of zuul-sphinx, so we can benefit from the better error message	12:54
zbr	now i do not know if an empty yaml file can be considered valid or not.	12:54
mordred	yaml loads empty files as a None object	12:56
mordred	but - yeah, if there's a better error message in master that's good	12:56
fungi	using yaml.load() on /dev/null returns None, yeah	12:57
mordred	looks like the error message fix is the only substantive change in zuul-sphinx atm - so maybe an 0.4.2	12:58
fungi	wfm	13:00
*** jpena\|lunch is now known as jpena		13:01
*** y2kenny has joined #zuul		13:08
mnaser	i'd love feedback on idea how i can test https://review.opendev.org/#/c/715045/1	13:23
*** bhavikdbavishi has quit IRC		13:26
*** bhavikdbavishi has joined #zuul		13:27
*** hashar has joined #zuul		13:28
*** zxiiro has joined #zuul		13:29
*** bhavikdbavishi has quit IRC		13:38
*** armstrongs has joined #zuul		13:44
mordred	zuul-maint: we landed an update to python-builder/python-base to remove the installation of recommends. let me know if you see any image build failures.	13:46
*** jcapitao has joined #zuul		13:51
*** y2kenny has quit IRC		13:58
*** y2kenny has joined #zuul		14:03
*** sgw has joined #zuul		14:06
openstackgerrit	Monty Taylor proposed zuul/nodepool master: Pin docker images to 3.7 explicitly https://review.opendev.org/715043	14:08
openstackgerrit	Monty Taylor proposed zuul/nodepool master: Add libc6-dev to bindep https://review.opendev.org/715216	14:08
mordred	mnaser: ^^ speak of the devil :)	14:09
mnaser	aha	14:09
*** ysandeep\|rover is now known as ysandeep\|away		14:10
*** guilhermesp has quit IRC		14:27
*** guilhermesp has joined #zuul		14:27
openstackgerrit	Monty Taylor proposed zuul/zuul-registry master: Use versioned python base images https://review.opendev.org/715225	14:28
*** chandan_kumar has joined #zuul		14:29
*** jkt has quit IRC		14:29
*** chandankumar has quit IRC		14:29
*** jkt has joined #zuul		14:30
*** sshnaidm has quit IRC		14:30
*** corvus has quit IRC		14:30
*** sshnaidm has joined #zuul		14:31
*** corvus has joined #zuul		14:31
corvus	mnaser: i don't have a good idea of how to test that; i don't think we can fake out the uri part of it. i think the only way to test would be a completely synthetic test or adding testing conditionals to the role and supplying test data	14:33
*** guilhermesp has quit IRC		14:34
*** guilhermesp has joined #zuul		14:35
*** guilhermesp has quit IRC		14:38
openstackgerrit	Monty Taylor proposed zuul/zuul master: Be explicit about base container image https://review.opendev.org/714549	14:38
*** guilhermesp has joined #zuul		14:38
*** guilhermesp has quit IRC		14:39
*** guilhermesp has joined #zuul		14:40
*** jamesmcarthur has joined #zuul		14:51
*** jamesmcarthur has quit IRC		14:59
*** jamesmcarthur has joined #zuul		15:00
*** jamesmcarthur_ has joined #zuul		15:02
openstackgerrit	Andreas Jaeger proposed zuul/zuul-website master: Update for OpenDev, https https://review.opendev.org/714261	15:04
*** jamesmcarthur has quit IRC		15:07
openstackgerrit	Jan Kubovy proposed zuul/zuul master: WIP: Enforce sql connections for scheduler and web https://review.opendev.org/630472	15:34
openstackgerrit	Jan Kubovy proposed zuul/zuul master: Improve typings in context of 630472 https://review.opendev.org/715247	15:34
y2kenny	So I was able to setup a nodepool with k8s provider and it looks to be working (it created some namespaces with a pod in it.) the nodes show up in the web UI and looks like the scheduler is using the right scheduler (in cluster, so the zone things seems to be working.)	15:39
y2kenny	but when the job is running, the stream log only showed Job console starting, running ansible setup, pre-runs and post runs	15:40
y2kenny	and the job failed after 3 attempt	15:40
y2kenny	so I look into the executor log (I ran it with the debug flag on.)	15:41
y2kenny	"Unable to start kubectl port forward" exception. Does that mean I needed an executor image with kubectl installed?	15:42
SpamapS	y2kenny: IIRC, the streaming was very recently fixed. And yes, the executor absolutely must have kubectl.	15:42
y2kenny	or does the executor needs the right role binding as well	15:42
clarkb	executors now need socat and kubectl on them	15:43
clarkb	if you are using the container images I expect they are present though	15:44
SpamapS	https://opendev.org/zuul/zuul/commit/2881ee578599b199280c60fb76b5201dd855f419	15:44
y2kenny	I am using the official executor image from dockerhub. I know I can extend that image but I am wondering if the executor actually uses the kubectl shell command or the library	15:44
y2kenny	SpamapS: AH	15:44
y2kenny	missed that one	15:44
SpamapS	Looks like maybe that hasn't been released yet?	15:45
SpamapS	hm no, 3.18.0 has the fixes, just not the fixed release note. ;)	15:45
y2kenny	I will give that a try and see how things go	15:46
clarkb	the actual rendered version has it properly under 3.18 on the website	15:46
fungi	https://zuul-ci.org/docs/zuul/reference/releasenotes.html#relnotes-3-18-0	15:51
*** chandan_kumar is now known as chandankumar		15:51
*** jamesmcarthur_ has quit IRC		16:00
*** jamesmcarthur has joined #zuul		16:01
mnaser	oh	16:02
mnaser	someting just occured to me	16:02
mnaser	do we have a semaphore or ordering enforce for our promote jobs	16:03
mnaser	what if two promote jobs get queued, does one start before the other?	16:03
fungi	the promote pipeline uses a supercedent pipeline manager	16:04
fungi	so only one runs at a time for the same project+branch	16:04
clarkb	and it enforces order	16:05
mnaser	ok cool, just something that came in mind. i know we have some things we do in 'post' that ends up needing a semaphore	16:05
fungi	right, and any subsequently queued changes supercede one another so only the most recently triggered one is ever waiting at a time	16:05
fungi	because also running builds for all the intermediate changes would just be a waste of resources and added delay	16:06
*** ysandeep\|away is now known as ysandeep		16:08
y2kenny	question about relationship between job/project/branch/pipeline.	16:11
y2kenny	if all the job defined for a pipeline for a project does not run due to no matching branch, what is the result of the triggered pipeline?	16:12
y2kenny	is that a noop (pass) or just the no-jobs	16:13
y2kenny	actually, nevermind... I think I found the relevant doc that points to no-jobs	16:14
fungi	yeah, if there are no jobs zuul doesn't (well, shouldn't anyway, modulo bugs) report	16:15
fungi	if you want it to report an unconditional passing result, add a noop job for those otherwise jobless branches	16:16
y2kenny	ok	16:17
fungi	if memory serves, the noop job is special in that it short-circuits immediately without having to even engage an executor	16:17
fungi	and doesn't need to be explicitly defined anywhere to be used in a project-pipeline	16:17
y2kenny	understood	16:20
AJaeger	zuul-maint, here's a small update for zuul-website, it changes some links and makes changes for OpenDev, please review https://review.opendev.org/#/c/714261/	16:23
fungi	AJaeger: rebased for a merge conflict, i guess?	16:24
fungi	reapplied my prior +2	16:24
*** dpawlik has quit IRC		16:26
AJaeger	fungi: yes, indeed. Thanks!	16:27
AJaeger	tobiash, thanks for second +2.	16:29
AJaeger	Could somebody +A the change please or what are we waiting for?	16:29
openstackgerrit	Jimmy McArthur proposed zuul/zuul-website master: Adding Infrastructure Donors https://review.opendev.org/715261	16:29
mnaser	this sat for a while: https://review.opendev.org/#/c/713469/ -- if anyone is feeling generous to put reviews too	16:30
mnaser	:p	16:30
y2kenny	ok so I added the start-zuul-console role and verified that kubectl is in the executor image but I still get port forwarding error. One obvious problem is the executor's access to the cluster so I gave the executor pod a clusterrolebinding that should let it do pretty much anything but that didn't seem to work. Does the executor k8s connection	16:32
y2kenny	work differently than nodepool?	16:32
y2kenny	like, do I have to mount in .kube/config?	16:32
AJaeger	thanks, tobiash !	16:32
mnaser	y2kenny: mind pasting th error?	16:33
tobiash	AJaeger: I didn't see fungi's review while reviewing it ;)	16:33
*** jamesmcarthur has quit IRC		16:33
y2kenny	mnaser:	16:33
y2kenny	2020-03-26 16:28:59,212 ERROR zuul.AnsibleJob: [e: 7dc1e284988742c08c0721de11a689b2] [build: 287412faaa9c45f4932b9d8b3a8507c0] Unable to start port forward:Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/zuul/executor/server.py", line 1975, in prepareAnsibleFiles fwd.start() File	16:33
y2kenny	"/usr/local/lib/python3.7/site-packages/zuul/executor/server.py", line 361, in start raise Exception("Unable to start kubectl port forward")Exception: Unable to start kubectl port forward	16:33
y2kenny	actually... I just verified kubectl works by exec into the pod	16:34
AJaeger	tobiash: ah, parallel reviewing ;)	16:34
*** jamesmcarthur has joined #zuul		16:34
y2kenny	I can do kubectl get nodes and I got the correct data	16:34
mnaser	y2kenny: what if you did "kubectl port-forward" in the zuul container?	16:34
y2kenny	mnaser: I will give that a try	16:34
y2kenny	error: TYPE/NAME and list of ports are required for port-forwardSee 'kubectl port-forward -h' for help and examples.	16:35
tobiash	y2kenny: is kubectl in a non-default location?	16:35
y2kenny	tobiash: no, I just use the official executor image	16:36
*** jamesmcarthur_ has joined #zuul		16:36
tobiash	y2kenny: zuul runs jobs in a sandbox and doesn't include paths like /opt by default	16:36
tobiash	hrm	16:36
mnaser	y2kenny: kubectl -n nodepool-ns get pods	16:36
mnaser	kubectl -n nodepool-ns port-forward pod/some-random-pod	16:36
mnaser	i dont remember the rest of the syntax	16:36
*** jamesmcarthur has quit IRC		16:37
y2kenny	mnaser: ok	16:38
y2kenny	I wonder if I need to restart nodepool after I restarted the executor	16:38
mnaser	y2kenny: kubectl port-forward pod/name :19885	16:38
mnaser	and add namespace if needed	16:38
mnaser	let me know what that outputs	16:38
y2kenny	# kubectl -n k8s-auto-0000000260 port-forward ubuntu-bionic-k8s-auto :19885Forwarding from 127.0.0.1:40309 -> 19885	16:40
openstackgerrit	Merged zuul/zuul-website master: Update for OpenDev, https https://review.opendev.org/714261	16:40
*** ysandeep is now known as ysandeep\|away		16:41
*** jcapitao is now known as jcapitao_afk		16:42
tobiash	corvus: do you also find exceptions like those in opendev? http://paste.openstack.org/show/791199/	16:42
y2kenny	mnaser: so I think the port forwarding is working inside the executor pod	16:43
tobiash	this leads to ignored events and occurs in our system roughly 7 times per 30 days	16:43
tobiash	clarkb: ^	16:43
clarkb	tobiash: looking	16:43
tobiash	thanks	16:43
clarkb	tobiash: is that the scheduler?	16:43
tobiash	yes	16:44
mnaser	y2kenny: hmm, interesting	16:44
y2kenny	I wonder if something was out of sync	16:45
mnaser	y2kenny: it looks like it's failing to do the regex match .. or its failing to start the port forward	16:45
y2kenny	I retriggered the job and seems like the error is not there any more	16:45
mnaser	y2kenny: oh weird, maybe your executors are not all running the same image?	16:46
clarkb	tobiash: searching todays log showed nothing and now I'm running a zgrep on compressed logs	16:46
*** jamesmcarthur_ has quit IRC		16:46
tobiash	clarkb: k, I had three hits in the last 7 days	16:46
y2kenny	does that make a different? I only have one executor inside the cluster which I just launched	16:46
y2kenny	the other executor is outside the cluster and should not be used to talk to k8s nodes	16:46
tobiash	so not a huge problem, but looks like we have a slight multithreading problem there	16:46
y2kenny	that one I haven't restarted in a while	16:47
*** jamesmcarthur has joined #zuul		16:47
y2kenny	mnaser: actually no, the error is still there (scrolled too fast)	16:48
clarkb	tobiash: its possible your cpus are faster than ours so you hit races like that more often (zgrep still running with no hits)	16:48
mnaser	ohhh	16:48
mnaser	i think i know what might have happened y2kenny	16:48
mnaser	`kubectl_command = 'kubectl'` but we use popen without shell=True	16:49
mnaser	don't we have to use a full path?	16:49
clarkb	I think it will use the PATH of the calling process in that case	16:50
clarkb	but ya rooting the path might fix it	16:50
y2kenny	other things that seems to look like an error:	16:52
y2kenny	DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'packages/ara/plugins/actions/ara_read.py) as it seems to be invalid: module'	16:52
*** jamesmcarthur has quit IRC		16:53
clarkb	y2kenny: is there a traceback above that?	16:53
clarkb	(that looks like the tail end of a traceback)	16:53
clarkb	but I think its saying you have enabled ara but don't have it installed properly?	16:53
mnaser	y2kenny: give me a second, i mthink i mgiht have a repro	16:54
y2kenny	e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'Using /var/lib/zuul/builds/e0c21bb20e604466b27023e42e2c2f93/ansible/setup_playbook/ansible.cfg as config file'2020-03-26 16:45:03,244 DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'[WARNING]: provided hosts list is	16:54
y2kenny	empty, only localhost is available. Note that'2020-03-26 16:45:03,244 DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: e0c21bb20e604466b27023e42e2c2f93] Ansible output: b"the implicit localhost does not match 'all'"2020-03-26 16:45:03,722 DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build:	16:54
y2kenny	e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'[WARNING]: Skipping plugin (/usr/local/lib/zuul/ansible/2.8/lib/python3.7/site-'	16:54
clarkb	ah ok in that case I think its probably fine as is. Because ya its saying I'm not doing ara things because its not properly installed. Disabling ara or installing it properly will make the warning goa way I think	16:54
y2kenny	another one I see under debug as well (that can also be nothing):	16:55
y2kenny	Ansible output: b"bwrap: Can't mount proc on /newroot/proc: Operation not permitted"	16:55
mnaser	y2kenny: can you run http://paste.openstack.org/show/791201/ like this in the container - python test.py k8s-auto-0000000260 ubuntu-bionic-k8s	16:56
mnaser	and see what it outputs?	16:56
y2kenny	it's ok to not use 260 right? (It got deleted by nodepool I think)	16:57
mnaser	yeah of course	16:57
mnaser	whatever you have running there	16:57
clarkb	y2kenny: tristanC had a note about the bwrap thing. 11:57:04 tristanC \| when y2kenny comes back, iirc, `bwrap: Can't mount proc` can be caused by either the lack of userns, or that bwrap is not root setuid	16:59
y2kenny	http://paste.openstack.org/raw/791201/	16:59
y2kenny	oops	16:59
y2kenny	# python test.py k8s-auto-0000000273 ubuntu-bionic-k8sError: unknown flag: --addressTraceback (most recent call last): File "test.py", line 32, in <module> raise Exception("Unable to start kubectl port forward")	16:59
mnaser	there we go	17:00
clarkb	y2kenny: for lack of userns I'm not sure of an easy way to check that, but I know rhel/centos/fedora disable by default	17:00
clarkb	y2kenny: suse and ubuntu enable by default iirc	17:00
mnaser	y2kenny: kubectl --version ?	17:00
y2kenny	clarkb: oooo I am using fedora as the cluster host	17:00
mnaser	ohhh i know what might be happening	17:00
clarkb	y2kenny: in that case I would check if it is setuid as that should solve it for you	17:00
mnaser	y2kenny: can you replace '--address', '127.0.0.1' by ... '--address=127.0.0.1',	17:01
mnaser	so you end up with	17:01
mnaser	https://www.irccloud.com/pastebin/CKylsa81/	17:01
y2kenny	# kubectl versionClient Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-10-10T16:38:01Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4",	17:01
y2kenny	GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}	17:01
y2kenny	(is my cluster too new?)	17:01
mnaser	no i think its the bug of how we use --address	17:02
mnaser	it should be --address=127.0.0.1 i think	17:02
y2kenny	the error is the same	17:04
y2kenny	unknown flag address	17:04
*** bolg has quit IRC		17:05
clarkb	tobiash: zgrep finished with no hits	17:05
y2kenny	yea... the kubectl in the image is too old for that	17:05
tobiash	clarkb: ok, thanks	17:05
clarkb	tobiash: thats 30 days of logs fwiw	17:05
y2kenny	the kubectl in the execturo image is 1.11 and it does not have address flag. kubectl 1.17 has it	17:05
tristanC	clarkb: only el7 has userns disabled by default, el8 and fedora should allow it out of the box	17:05
mnaser	y2kenny: i think we're installing a really old kubectl version which doesnt have --address	17:05
clarkb	tristanC: oh that is good to know, thanks	17:06
y2kenny	mnaser: yes... it's 1.11 in the image	17:06
tristanC	y2kenny: are you running the executor in kubernetes? and are they using a privileged pod?	17:06
mnaser	the issue is kubectl does not have --address cause its old	17:06
y2kenny	tristanC: yes in k8s, no on privleged	17:06
tristanC	y2kenny: then you need to use privileged pod for bwrap, that is what preventing the proc mounting	17:07
y2kenny	ok. No need to do additional setting on the host side due to fedora?	17:07
tristanC	y2kenny: i guess you already disabled cgroupv2 for k8s, or you are not using f31.	17:08
y2kenny	f30 :)	17:08
mnaser	where are we installing kubectl from?	17:10
tristanC	y2kenny: upgrading to f31 should work fine by adding `systemd.unified_cgroup_hierarchy=0` to the linux cmdline	17:11
y2kenny	tristanC: understood.	17:11
*** bolg has joined #zuul		17:11
*** jamesmcarthur has joined #zuul		17:12
y2kenny	yay!!! \o/	17:13
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Protect getCachedChanges from concurrent modification https://review.opendev.org/715270	17:14
y2kenny	wait... celebrated too early	17:14
tobiash	clarkb, corvus: This should fix this ^	17:14
fungi	it's never too early to celebrate	17:15
* fungi lives in a constant state of celebration		17:15
y2kenny	lol	17:15
openstackgerrit	Jimmy McArthur proposed zuul/zuul-website master: Adding Infrastructure Donors https://review.opendev.org/715261	17:17
*** jcapitao_afk is now known as jcapitao		17:17
tobiash	clarkb: btw, this exception always happened during high load on our scheduler so might hit us more often when the scheduler is contended	17:19
y2kenny	so I don't see the bwrap error with privileged (thanks tristanC)	17:20
y2kenny	and I get more log from the job	17:20
tobiash	we're starting to suffering from more stalls during tenant reconfigurations (which are mostly cpu bound)	17:20
tobiash	so looking forward to scale out scheduler	17:20
y2kenny	but I still get failure with 3 attempts	17:20
y2kenny	mnaser: does the port forwarding affect job execution? or just log streaming?	17:23
*** evrardjp has quit IRC		17:49
*** evrardjp has joined #zuul		17:49
fungi	mnaser: are you good with https://review.opendev.org/715261 ?	17:49
*** openstack has quit IRC		17:49
*** openstack has joined #zuul		17:53
*** ChanServ sets mode: +o openstack		17:53
*** jcapitao has quit IRC		17:53
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Protect getCachedChanges from concurrent modification https://review.opendev.org/715270	17:54
tobiash	clarkb: updated for all drivers ^	17:54
clarkb	thanks!	17:54
*** jpena is now known as jpena\|off		18:02
*** sshnaidm is now known as sshnaidm\|afk		18:17
*** jamesmcarthur has quit IRC		18:20
*** jamesmcarthur has joined #zuul		18:21
*** hashar has quit IRC		18:21
*** jamesmcarthur has quit IRC		18:24
*** jamesmcarthur has joined #zuul		18:24
*** jamesmcarthur has quit IRC		18:26
*** jamesmcarthur has joined #zuul		18:27
*** jamesmcarthur has quit IRC		18:36
*** jamesmcarthur has joined #zuul		18:37
*** jamesmcarthur has quit IRC		18:50
*** jamesmcarthur has joined #zuul		18:50
y2kenny	I just noticed another error when trying to use the k8s driver with labels.type = namespace:	19:41
y2kenny	File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/launcher.py", line 81, in main return NodePoolLauncherApp.main() File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 249, in main return super(NodepoolDaemonApp, cls).main(argv) File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line	19:41
y2kenny	196, in main return cls()._main(argv=argv) File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 186, in _main return self._do_run() File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 230, in _do_run return super(NodepoolDaemonApp, self)._do_run() File	19:41
y2kenny	"/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 192, in _do_run return self.run() File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/launcher.py", line 61, in run config = self.pool.loadConfig() File "/usr/local/lib/python3.7/site-packages/nodepool/launcher.py", line 925, in loadConfig config =	19:41
y2kenny	nodepool_config.loadConfig(self.configfile) File "/usr/local/lib/python3.7/site-packages/nodepool/config.py", line 264, in loadConfig newconfig.setProviders(config.get('providers')) File "/usr/local/lib/python3.7/site-packages/nodepool/config.py", line 150, in setProviders p.load(self) File	19:41
y2kenny	"/usr/local/lib/python3.7/site-packages/nodepool/driver/kubernetes/config.py", line 91, in load pp.load(pool, config) File "/usr/local/lib/python3.7/site-packages/nodepool/driver/kubernetes/config.py", line 62, in load full_config.labels[label['name']].pools.append(self)KeyError: 'zuul-nodes'	19:41
y2kenny	( I have name: zuul-nodes, type: namespace.)	19:42
y2kenny	not sure if it's my configuration error or a real bug	19:42
fungi	y2kenny: it may be easier to paste lengthy output like tracebacks somewhere like http://paste.openstack.org/ and then link them in here	19:43
y2kenny	fungi: right... sorry about that:	19:43
y2kenny	http://paste.openstack.org/show/791210/	19:43
fungi	thanks! that's a bit more readable at least	19:45
Shrews	y2kenny: sort of seems like a config error. make sure you have that label listed in the top-most labels section (https://zuul-ci.org/docs/nodepool/configuration.html#attr-labels)	19:45
y2kenny	oh right... I forgot about that correspondence.	19:46
y2kenny	sorry about the noise	19:46
fungi	i guess label['name'] is looking up to a value of "zuul-nodes" there and it's not been loaded into the full_config.labels dict	19:46
y2kenny	I am actually not too sure what this config does so I thought I would play with it a bit. with the labels.type=pod, it create min-ready # of namespace	19:47
y2kenny	ok so it created a namespace as defined by the driver config. Do you how zuul will use this fungi?	19:52
fungi	i'm not familiar with the kubernetes driver, i was mostly trying to reverse engineer the error from the traceback	19:54
y2kenny	ok	19:57
y2kenny	it's not a big deal, I am just curious on its function.	19:57
*** jamesmcarthur has quit IRC		20:09
corvus	y2kenny: you'll see a namespace either way...	20:12
y2kenny	corvus: so what I noticed is the listing in the UI->Nodes	20:12
corvus	y2kenny: if you request a pod, it still gets a namespace so that it can be set up securely, but there will also be a pod, and that's what zuul puts in the inventory. the namespace isn't really accessible, it's an implementation detail.	20:12
corvus	y2kenny: if you request a namespace, then you'll get a namespace without a pod, and zuul gets information about how to use the namespace, but nothing in the inventory	20:13
y2kenny	so for the second case, how would I use the namespace without a pod	20:14
corvus	y2kenny: the listing in the nodes tab should either be for a pod or a namespace; you shouldn't see an extra namespace there (what i was describing is what you would see if you looked in k8s)	20:14
y2kenny	so I see additional nodes with labels zuul-nodes (the name of the namespace in this case) with connection name space	20:15
y2kenny	namespace*	20:15
*** jamesmcarthur has joined #zuul		20:15
y2kenny	the server is auto generated with -d (in my case my pool name is k8s-auto , so the server is k8s-auto-0000000279 for example)	20:16
y2kenny	Full line: 0000000279zuul-nodesnamespacek8s-auto-0000000279k8s-containersready23 minutes ago	20:16
corvus	that looks right; the next job that requests a "zuul-nodes" label will get that namespace assigned to it	20:17
corvus	y2kenny: as for using it, info about connecting to it will appear in the zuul.resources ansible variable	20:17
y2kenny	ok. So it's up to a zuul role to launch something in it	20:18
y2kenny	?	20:18
y2kenny	zuul role or ansible role?	20:18
corvus	y2kenny: yes... tristanC do you know if we have anything to help with that?	20:18
tobiash	corvus, AJaeger: zuul py35 jobs are failing since the release of cliff 3.0.0 which seems to require py36	20:18
corvus	y2kenny: see here: https://zuul-ci.org/docs/zuul/reference/jobs.html#var-zuul.items.resources	20:19
corvus	y2kenny: that has some examples about how to use it	20:19
y2kenny	OOOOH	20:20
tobiash	AJaeger: was it intended to also drop py35 there https://review.opendev.org/705612 ?	20:20
corvus	y2kenny: zuul will write out the .kube/config file for you, so it should be all ready to go	20:20
y2kenny	I did not see that bit (I was looking at that page for some other variables.)	20:20
corvus	y2kenny: we should probably cross-reference that better from the nodepool docs	20:20
tristanC	corvus: y2kenny: here is playbook that populate a namespace: https://review.opendev.org/#/c/570669/8/playbooks/openshift/pre.yaml	20:21
corvus	tristanC: ah, great example, thanks!	20:21
Shrews	before i forget, https://review.opendev.org/#/q/topic:node-attr has some changes y2kenny needs. i can fix up the merge conflict once we merge its parent	20:22
corvus	tobiash: any idea what we use cliff for?	20:22
tobiash	corvus: looks to be a transitive dependency	20:22
tobiash	I don't know yet what pulls it in, maybe openstacksdk?	20:22
tobiash	I just learned that cliff exists	20:23
Shrews	i don't think sdk depends on cliff. i think osc does	20:23
y2kenny	tristanC, corvus: thanks for the examples.	20:23
y2kenny	mnaser, tristan: so I manually upgraded the kubectl inside the executor container image	20:24
mnaser	y2kenny: cool -- how did that work?	20:25
*** saneax has quit IRC		20:25
mnaser	fwiw we should install newer kubectl in our images..	20:25
mnaser	we have like 1.11 or some ancient version that doesn't support --address	20:25
mordred	sdk definitely doesn't depend on cliff - would be interesting to know what's pulling it in	20:25
corvus	mnaser: what's --address for?	20:25
y2kenny	mnaser: that seems to have solved the exception (I don't see the error in the executor log any more, but I am still not able to get the pod to do useful work.)	20:26
mnaser	corvus: it seems you used --address in port-forward to force it to listen to 127.0.0.1	20:26
mnaser	but it seems like the default value _might_ already be 127.0.0.1	20:26
corvus	oh	20:26
mnaser	so we could drop it and not have to install newer kubectl	20:26
corvus	we get current kubectl from openshift clients	20:26
corvus	so if we do need to upgrade, we can look at either whether there's a newer openshift, or if we also need to install a dedicated kubectl	20:27
corvus	but yeah, maybe dropping address would be better	20:27
corvus	mnaser, y2kenny: i can try a test without --address (i shoud be able to just manually set up the same kind of port forward) using the current openshift kubectl and see if that's a solution	20:28
corvus	mordred: any idea the best way to find out what's using cliff?	20:28
corvus	(pip --graphviz would be great :)	20:28
mordred	corvus: we can just look in the tox logs	20:28
mordred	one sec	20:28
mordred	this is zuul or nodepool?	20:29
corvus	tobiash: have a link to a failure online, or you just seeing this locally now?	20:29
mnaser	corvus: given we use ubuntu, why don't we use the kubeernetes provided deb packages?	20:29
tobiash	corvus: https://c7cc7615691360b259ca-b4831d08abece0047714b6befcdb357a.ssl.cf1.rackcdn.com/715270/2/check/tox-py35/8dfd7e8/job-output.txt	20:29
corvus	mnaser: need openshift clients, and we get kubectl "for free"	20:29
AJaeger	tobiash: yes, dropping py3.5 was intented	20:29
corvus	mnaser: not a big deal, just seemed like unecessary extra work at the time	20:29
mnaser	oh, yes, gotcha	20:29
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Cap cliff to <3.0.0 https://review.opendev.org/715303	20:30
tristanC	fwiw, zuul-operator could run kubectl node and console-stream integration test, this initial change is still waiting for review: https://review.opendev.org/714165	20:30
mnaser	i dont know if the openshift client behaviour is listening to port forward 127.0.0.1	20:30
corvus	mnaser: yeah, i'll test it with the actual openshift kubectl	20:30
tobiash	I guess then we need to cap it or drop py35 in zuul as well	20:30
mordred	corvus: Collecting cliff>=2.8.0 (from stestr>=1.0.0->-r /Users/mordred/src/opendev.org/zuul/zuul/test-requirements.txt (line 6))	20:31
AJaeger	tobiash: that's what all (most?) of oslo did as part of the Ussuri goal	20:31
mordred	corvus: stestr	20:31
corvus	tristanC: i was just about to review that when this happened :(	20:31
corvus	mordred: can we avoid stestr?	20:31
corvus	(this is the sort of dependency we tried really hard to avoid in zuul and nodepool)	20:31
y2kenny	mnaser: do you know if all the pre and post run are done on the target node or the executor local?	20:31
mnaser	corvus: cool, i gave y2kenny a small test script and yeah the error he was getting was --address is missing error	20:31
mordred	corvus: it would probably be quicker to just fix cliff	20:32
corvus	like, i don't understand why stestr would drop 3.5 support	20:32
mnaser	y2kenny: playbooks run on remote node	20:32
corvus	mordred: AJaeger says it was intentional in cliff	20:32
tobiash	I guess stestr doesn't want to but didn't cap cliff as well	20:32
mordred	well - as the current PTL of cliff, let me retun to that question real quick	20:32
corvus	mordred: i await your proclamation with renewed interest :)-	20:32
corvus	wow, sorry about the soul patch there, that was a typo	20:33
AJaeger	overall idea was AFAIK to remove py35 unless there was a need to keep it - and there were few exceptions only. I just updated a few repos	20:33
tobiash	since stestr is wider spread than openstack that might be such an exception	20:34
mordred	corvus: remote: https://review.opendev.org/715305 Re-add support for python 3.5	20:34
AJaeger	https://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html#completion-criteria has "The minimum version of Python now supported by <project> is Python 3.6." as note	20:34
mordred	corvus: feel free to +2 that	20:34
corvus	yeah, i'm getting the idea here that stestr probably didn't intend this and fixing cliff is the correct solution	20:34
mordred	and I'll request a release as soon as it's landed	20:35
y2kenny	Oh... I think I just caught a problematic line. For the "prepare-workspace" task, I got "Output suppressed because no_log was given"	20:35
AJaeger	mordred: want to run py35 tests as well?	20:35
tobiash	I guess that makes sense	20:36
mordred	AJaeger: probably not a bad idea :)	20:36
corvus	mordred, AJaeger: +2 on 715305 and i think running py35 tests is reasonable	20:36
mordred	AJaeger: is there a template? or should I just add in openstack-tox-py35 ?	20:36
mordred	corvus, AJaeger: updated with py35 jobs added	20:37
AJaeger	mordred: openstack-python35-jobs is the mpalte	20:37
mordred	oh - there's a ... one sec	20:37
mordred	done	20:38
y2kenny	what is the best way to debug the ansible being run on the target node? (is there a way to have the executor run the ansible playbook with vvvv?)	20:38
AJaeger	mordred: LGTM	20:38
corvus	y2kenny: run "zuul-executor verbose" on the executor	20:38
corvus	y2kenny: it will switch to "-vvv" (i'm pretty sure it's 3 not 4 v's) for all subsequent jobs	20:39
mordred	corvus, tobiash: as soon as that lands I'll get a release cut	20:39
tobiash	awesome, I guess I can drop the cap then :)	20:39
corvus	y2kenny: then "zuul-executor unverbose" will restore normal behavior	20:39
y2kenny	corvus: thanks. I will give that a try	20:39
mordred	I've got 2 other releases I need cut today anyway, so my today is mostly watching patches land	20:39
tobiash	y2kenny: don't forget the unverbose, I once managed to fill the hard disks over lunch :-P	20:40
y2kenny	tobiash: :)	20:41
y2kenny	do I need to restart the executor?	20:41
AJaeger	mordred: need 90 mins to run	20:41
corvus	y2kenny: no, if you do, it'll also go back to normal (non-verbose) mode	20:42
tobiash	no, just execute that command, that will tell the running executor to switch behavior	20:42
y2kenny	um... ok... so perhaps the log I am looking at is not what I think it is	20:42
y2kenny	I am looking at the console.log from the web UI	20:42
tobiash	y2kenny: you need to look at the executor logs then	20:42
mordred	AJaeger: I'm also waiting on two openstacksdk patches to land so we can cut a release to also fix an issue with declared python support - in this case we forgot to put in a "we only support python3" while adding a python3-only dep	20:42
tobiash	those will contain the ansible debug logs	20:42
corvus	y2kenny: the easiest way to get the right output is to find the build uuid (it's in the url for the build) and grep for that in the executor logs	20:43
y2kenny	is it a separate file or do they get dump to stdout?	20:44
y2kenny	my case is easier... still prototyping so there's really just one job :)	20:44
y2kenny	I see a lot of build: <hash> and e: <hash>	20:44
corvus	y2kenny: in k8s, should go to stdout, but those might be at info-level only while the ansible stuff is at debug level? let me check	20:44
*** jamesmcarthur has quit IRC		20:45
y2kenny	I see some DEBUG tag (I am running executor in debug mode) but those are executor debug not ansible debug?	20:45
corvus	ok good, then they should show up there	20:45
y2kenny	or ansible debug will get propagated to there?	20:45
y2kenny	ok	20:45
AJaeger	mordred: https://review.opendev.org/#/c/715243/6/releasenotes/notes/python-3.5-629817cec092d528.yaml has a typo, doesn't it?	20:45
mordred	AJaeger: yup!	20:46
corvus	y2kenny: all of the ansible output that goes to the console log should also end up in the zuul executor debug log, and when in verbose mode, all the extra ansible output should end up there too	20:46
mnaser	corvus: we can drop --address	20:46
mnaser	root@c0d83c34bf93:~# kubectl port-forward pod/nginx-deployment-574b87c764-795g5 :80 => Forwarding from 127.0.0.1:34177 -> 80	20:46
mnaser	(thats in zuul/zuul-executor docker contaienr)	20:46
corvus	y2kenny: those should all start with "Ansible output: "	20:46
mordred	AJaeger: thanks - that turns out to be an important typo	20:47
corvus	mnaser: awesome, thanks, that's more or less what i was going to test but hadn't gotten around to; you want to push the patch or should i?	20:47
y2kenny	corvus: ok yes! I see the verbose now	20:47
mnaser	corvus: i will right now	20:47
AJaeger	mordred: one more typo found - sorry ;(	20:48
AJaeger	mordred: finally the note makes sense, took me some time	20:49
mordred	AJaeger: wow	20:49
mordred	AJaeger: done!	20:49
AJaeger	thanks	20:50
corvus	tristanC: that integration test looks great; i like the git service -- that keeps it nice and separate from the scheduler pod.	20:51
AJaeger	mordred: I'll add another "." to your note...	20:51
openstackgerrit	Mohammed Naser proposed zuul/zuul master: executor: drop --address=127.0.0.1 from kubectl https://review.opendev.org/715308	20:52
tristanC	corvus: nice thanks! I'm adding the nodepool-launcher service to the operator, and i think it will be easy to run the integration test on a kubernete nodeset	20:52
mnaser	corvus: ^	20:52
y2kenny	corvus: ok so the verbose didn't quite help in this particular case because the failing role is the "prepare-workspace" and no_log was turned on (I didn't realize what that no_log means until I look at the code for the role just now.)	20:54
AJaeger	mordred: pushed a cleanup on your stack as well - https://review.opendev.org/#/c/715309/	20:56
corvus	y2kenny: oh sorry, i should have caught that	20:57
corvus	y2kenny: i think that no_log is in there just because it's really verbose otherwise	20:57
y2kenny	yea... since it's a file sync	20:57
y2kenny	so the pod image I am using for this is the one from the tutorial	20:57
corvus	mordred: we should maybe make the no_log a flag, so someone in y2kenny's situation could turn that off in the base job	20:57
mordred	corvus: ++	20:58
corvus	y2kenny: oh, if you're synchronizing to a pod, you'll need to do something different	20:58
y2kenny	BUT, because I am using the pod type, it's launching the image plain w/o the volume	20:58
y2kenny	i.e. w/o the ssh key	20:58
y2kenny	I am guessing that's the issue?	20:59
corvus	y2kenny: to be clear, this is using a "pod" nodepool type, not "namespace"?	20:59
y2kenny	that's correct	21:00
y2kenny	I haven't gone to the namespace usage yet	21:00
y2kenny	but I have a feeling that's probably the one more useful to me	21:00
y2kenny	since I doubt I will have a build/test job that would work right away by launching a plain pod	21:01
corvus	y2kenny: for a pod, i think you need to use the "prepare-workspace-openshift" role	21:01
corvus	(yes, it works for k8s)	21:01
y2kenny	oh!... I will give that a try	21:01
y2kenny	so just a quick side question, do you guys then have different base/pre.yaml or base job for different nodeset?	21:02
y2kenny	to manage this type of variation?	21:02
corvus	y2kenny: basically, i think synchronize doesn't work with k8s, so instead, it runs "oc rsync"; that's one of the reasons we automatically include the openshift client on the executor image	21:03
y2kenny	ooo...	21:03
* mnaser would ideally like that we use kubernetes-isms instead of openshift-isms in our roles :>		21:04
corvus	y2kenny: i think that's what tristanC uses in his mixed environments; you can have as many base jobs as you want, so if you've got container jobs, make sure they inherit from the container base job. another approach would be to extend the base pre-playbook to include either the prepare-workspace or prepare-workspace-openshift role depending on what's in the inventory	21:04
corvus	mnaser: yes, in this case there's a thing the openshift client supports that kubectl doesn't: rsync	21:04
corvus	but it works with both k8s and openshift	21:05
corvus	so it makes things easy for us	21:05
mnaser	oh i see	21:05
* mnaser just sometimes assumes everyone is running inside containers		21:05
corvus	i think we talked about duplicating or renaming that role (to reduce confusion) but i don't recall if that made it all the way to a patch yet	21:05
*** y2kenny has quit IRC		21:06
tristanC	corvus: y2kenny: ftr, here is a base pre that use different prepare roles depending of the connection: https://pagure.io/fedora-project-config/blob/master/f/playbooks/base/pre.yaml#_12	21:06
corvus	tristanC: yeah, maybe that's the sort of thing we should do in zuul-base-jobs and the zuul docs	21:06
tristanC	corvus: well, ideally the prepare-workspace would do the right thing, but there is also the issue with build-sshkey which fails with kubectl	21:07
*** y2kenny has joined #zuul		21:08
mnaser	is someone looking into the tox-py35 failures?	21:08
corvus	tristanC: good point; maybe we can update both of those to protect with "ansible_connection != kubectl"	21:08
corvus	mnaser: yes, that's the cliff thing from earlier; pending a cliff release	21:08
mnaser	i can pick it up if someone isnt looking	21:08
mnaser	ah okay, cool	21:08
corvus	mnaser: https://review.opendev.org/715305	21:08
*** jamesmcarthur has joined #zuul		21:08
corvus	actually, i guess pending that landing then a cliff release	21:08
* y2kenny got disconnected due to stupid vpn		21:09
mnaser	all you missed ^ https://www.irccloud.com/pastebin/GDqUsdr5/	21:10
y2kenny	mnaser: thanks! those are important bits	21:11
mnaser	np	21:11
* mnaser off to seek food		21:11
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool launcher service initial deployment https://review.opendev.org/715310	21:20
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool external config https://review.opendev.org/715311	21:20
*** jamesmcarthur has quit IRC		21:30
*** jamesmcarthur has joined #zuul		21:33
*** tjgresha__ has joined #zuul		21:33
*** tjgresha_ has quit IRC		21:35
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool external config https://review.opendev.org/715311	21:48
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Adapt the integration playbook to be usable locally https://review.opendev.org/714163	21:48
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool labels to integration test https://review.opendev.org/715316	21:48
tristanC	corvus: would you mind checking https://review.opendev.org/714163 too, it's a small re-org that really help to run the test locally	21:49
tristanC	then https://review.opendev.org/715316 (and its parents) should demonstrate a working console-stream when using a kubectl connection with mnaser change that removes the faulty `--address` argument	21:50
*** jamesmcarthur has quit IRC		21:59
*** jamesmcarthur has joined #zuul		22:01
mordred	tobiash, corvus: cliff change is in the gate	22:21
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool labels to integration test https://review.opendev.org/715316	22:25
*** tjgresha_ has joined #zuul		22:35
*** tjgresha__ has quit IRC		22:38
*** rfolco has quit IRC		22:45
*** jamesmcarthur has quit IRC		22:46
*** rfolco has joined #zuul		22:46
*** rfolco has quit IRC		22:52
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325	23:00
*** jamesmcarthur has joined #zuul		23:02
*** y2kenny has quit IRC		23:04
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Remove bashate from test-requirements https://review.opendev.org/715328	23:23
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325	23:24
tristanC	and it seems like install-kubernetes broke, it's now failing with `X The none driver requires conntrack to be installed for kubernetes version 1.18.0`	23:26
tristanC	corvus: also it seems like the zuul-operator functional test is a bit flaky, it seems like when adding a connection, the scheduler sometime restart before the merger, and the cat job failed unexpectedly, resulting in the scheduler not loading the tenant config	23:27
corvus	tristanC: can the operator manage that sequence?	23:27
tristanC	corvus: the role is patching the service in order, but it seems like kubernetes sometime restart the service out of order... here is the task: https://opendev.org/zuul/zuul-operator/src/branch/master/roles/zuul-restart-when-zuul-conf-changed/tasks/main.yaml#L13	23:29
tristanC	i guess we could add a lockstep between each patch, but wouldn't it be easier if the scheduler simply restart failed cat job?	23:30
corvus	tristanC: yes, but only if the merger is actually going to get fixed. if it's running with the wrong configuration, it'll never succeed	23:30
corvus	in that case, a hard, fast failure is better	23:30
corvus	so i'd prefer the operator do what we would ask a human to do, which is restart the services in the correct order	23:30
corvus	it's probably worth checking that the merger is up before proceeding to the scheduler	23:31
corvus	or, at least checking that the merger is down :)	23:31
tristanC	i meant, a simple cat job retry would likely mitigate that failure	23:31
corvus	doesn't actually have to be up	23:31
corvus	tristanC: probably not	23:31
corvus	tristanC: imagine a sizable cluster with 20 mergers and 30 executors	23:31
corvus	tristanC: we can retry cat jobs really fast; much faster than a scheduler can restart	23:32
*** jamesmcarthur has quit IRC		23:32
tristanC	corvus: the scheduler cat job failed at 2020-03-26 21:38:16,661 in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e64/715310/1/check/zuul-operator-functional-k8s/e645105/docker/k8s_scheduler_zuul-scheduler-0_default_7f90728a-333d-44d2-8e85-be73693b0f68_0.txt	23:32
*** jamesmcarthur has joined #zuul		23:33
tristanC	corvus: the merger got started at 2020-03-26 21:38:18,279 in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e64/715310/1/check/zuul-operator-functional-k8s/e645105/docker/k8s_merger_zuul-merger-75d5bd4644-klnhg_default_a6bdc51c-5e50-4a64-ba28-8eba1cbfecb1_0.txt	23:33
corvus	tristanC: right, i get that slowing it down 2 seconds in this test job would fix this race in this test. but this is a production bug, so we should fix it systemically so this doesn't happen for large installations.	23:34
tristanC	corvus: so... what would be the `ready` condition for merger service?	23:34
corvus	tristanC: they don't actually have to be ready, they just have to not be running the old config.	23:35
corvus	tristanC: the easiest thing to do (and what we do with the opendev restart script) is to make sure they're down, not up.	23:35
tristanC	corvus: oh ok, then that's something we should be able to do	23:36
corvus	https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_restart.yaml	23:36
corvus	(obvs not k8s, but that's the play structure)	23:36
*** jamesmcarthur has quit IRC		23:38
tristanC	in k8s, i guess we can process merger and executor, and make sure they are in the desired state before doing the scheduler.	23:42
tristanC	since we use deployment and statefulset, i don't think we can easily manage each pod individually	23:42
tristanC	perhaps we could delete instead of patching, but then i think we'll loose the persistent volume	23:43
tristanC	time to afk for me. have a good rest of the day folks	23:45
*** jamesmcarthur has joined #zuul		23:53
*** tosky has quit IRC		23:53

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!