Friday, 2019-01-25

*** rlandy has quit IRC		00:48
pabelanger	I'm looking into project-ssh-key for openstack, but some reason they don't load properly: https://zuul.openstack.org/api/project-ssh-key/openstack-infra/project-config.pub	02:33
pabelanger	okay, thanks to help from tristanC curl does work	02:46
*** bhavikdbavishi has joined #zuul		02:47
tristanC	pabelanger: fwiw the url also loads fine with Firefox	02:50
*** bhavikdbavishi has quit IRC		02:52
*** chandankumar has quit IRC		02:57
*** chandankumar has joined #zuul		02:59
*** bhavikdbavishi has joined #zuul		02:59
*** bjackman has quit IRC		03:25
pabelanger	tristanC: hmm, so maybe just chrome. I actually get an error: http://paste.openstack.org/show/743437/	03:27
ianw	how bizarre, i paste that link into my firefox and get a non-working zuul status page	03:31
ianw	but then shift reload it, and get the key	03:31
ianw	some sort of weird caching thing?	03:31
*** bjackman has joined #zuul		03:34
*** sanjayu_ has joined #zuul		03:35
pabelanger	refresh doesn't work for me on chrome	03:38
pabelanger	I just see Fetching info...	03:39
*** sanjayu_ has quit IRC		03:39
ianw	tristanC / pabelanger: yeah -> https://imgur.com/a/jqWjNlO	03:50
ianw	it looks like project-config.pub is served up from a service worker	03:50
ianw	which messes up the app ... when i do a shift-reload it actually grabs it from the remote end	03:52
*** bhavikdbavishi has quit IRC		04:03
*** bhavikdbavishi has joined #zuul		04:12
tristanC	ianw: yes that's expected, the service worker doesn't let you query the api directly, and it will try to interpret those url as html5 links	04:24
tristanC	well we may be able to trick it into loading the actual link, but that's not implemented yet	04:25
*** spsurya has joined #zuul		05:40
*** badboy has joined #zuul		06:27
*** quiquell\|off is now known as quiquell		06:44
quiquell	ianw, tristanC: We have being testing new zuul version and it breaks zuul-web	07:28
tristanC	quiquell: what's breaking?	07:28
quiquell	tristanC: web UI the API is good	07:29
quiquell	tristanC: Can be our setup, do you know if there are any known issue with that ?	07:29
tristanC	quiquell: we didn't had issue when updating sf-3.2 with zuul-3.5.0, what's your issue?	07:30
*** bjackman has quit IRC		07:30
quiquell	tristanC: ack, have to check it myself they told me today	07:32
tristanC	there may be an issue if you restart all the service at once, it seems like they tries to create the new artifact table concurrently	07:33
tristanC	we mitigated that in software-factory by manually doing the alembic migration before starting the services	07:34
*** gtema has joined #zuul		07:38
quiquell	tristanC: humm good to know, let me test myself will go back with more info	07:39
quiquell	panda\|off: this is good now ? https://review.rdoproject.org/r/#/c/18475/	07:40
tristanC	quiquell: btw, have you checked the zuul-runner change, it let you run a job locally without the services, just direct ansible-playbook: https://review.openstack.org/632064	07:41
quiquell	tristanC: holy sh... really ?	07:44
quiquell	tristanC: we will test that, that would be fantastic,	07:45
quiquell	but the review seems unrelated to that	07:46
quiquell	is the correct one ?	07:46
tristanC	quiquell: zuul-runner (topic:freeze_job) is mentioned on https://tree.taiga.io/project/tripleo-ci-board/epic/5 ...	07:46
tristanC	quiquell: that last review adds --depends-on argument to run local job with speculative change	07:47
tristanC	quiquell: you need the whole patch stack to make it work	07:47
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: config: add playbooks to job.toDict() https://review.openstack.org/621343	07:48
quiquell	tristanC: This is exactly what we needed, let's work out next sprints	07:49
quiquell	tristanC: thanks so much	07:49
*** bhavikdbavishi has quit IRC		07:50
tristanC	quiquell: well we need to groom what is missing, such as secrets substitution map and some sort of embedded nodepool-launcher to manage instances lifecycle	07:51
quiquell	tristanC: but is like the right path	07:52
quiquell	tristanC: we can help with that	07:52
tristanC	quiquell: right now, you need to give the tests instances as command line parameters	07:52
quiquell	tristanC: will test at idle brain cycles	07:52
tristanC	we may deploy the new rest api on rdoproject.org, so that you can directly try the zuul-runner client	07:53
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: config: add tenant.toDict() method and REST endpoint https://review.openstack.org/621344	07:55
quiquell	tristanC: man this is fantastic work	07:55
tristanC	quiquell: heh you're welcome, but jhesketh did most of the work	07:56
quiquell	tristanC: would be nice to not need the whole zuul enchilada to run jobs	07:57
*** quiquell is now known as quiquell\|brb		08:01
*** panda\|off is now known as panda		08:04
*** bjackman has joined #zuul		08:04
*** quiquell\|brb is now known as quiquell		08:19
*** gtema has quit IRC		08:26
*** bhavikdbavishi has joined #zuul		08:26
*** mhu has joined #zuul		08:36
*** avass has joined #zuul		08:46
avass	does anyone have experience with trying to run programs interactively, preferably under a set session on windows through zuul/ansible?	08:46
*** jpena\|off is now known as jpena		08:51
*** hashar has joined #zuul		09:09
*** strigazi has quit IRC		09:15
badboy	I think there's a bug in Nodepool and/or Zuul when installing through pip3	09:36
*** ssbarnea\|bkp2 has joined #zuul		09:38
badboy	Nodepool installs kubernetes-8.0.1 which installs urllib3-1.24 and chardet-3.0.4 but Zuul needs urllib3-1.22 and chardet-3.0.2	09:38
badboy	installing kubernetes-8.0.0 and the forcing install of urllib3-1.22 and chardet-3.0.2 resolves the issue	09:38
*** ssbarnea\|rover has quit IRC		09:39
*** ssbarnea\|bkp2 has quit IRC		09:59
*** ssbarnea\|rover has joined #zuul		10:00
*** ssbarnea\|rover has quit IRC		10:01
*** ssbarnea has joined #zuul		10:01
openstackgerrit	Jean-Philippe Evrard proposed openstack-infra/zuul-jobs master: Allow different filenames for Dockerfiles https://review.openstack.org/632979	10:02
*** luizbag has joined #zuul		10:03
*** ssbarnea\|bkp2 has joined #zuul		10:05
*** ssbarnea has quit IRC		10:07
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul-jobs master: install-nodejs: add support for RPM-based OSes https://review.openstack.org/631049	10:38
badboy	is it possible to watch only a particular branch in Zuul?	11:06
avass	badboy: something like this https://zuul-ci.org/docs/zuul/admin/drivers/gerrit.html#attr-pipeline.trigger.%3Cgerrit%20source%3E.branch ?	11:07
badboy	avass: probably that's it	11:09
badboy	avass: whats the syntax? pipeline.trigger.my-gerrit-server.my-repo.my-branch?	11:09
avass	badboy: not sure, i got it to work by setting an event branch specific	11:11
badboy	avass: would you mind sharing your config?	11:11
avass	http://paste.openstack.org/show/743453/	11:13
badboy	avass: thx	11:14
badboy	avass: what's mysql for?	11:14
avass	badboys: it's so it reports to the mysql server which stores build status that's found at <tenant>/builds on the web ui	11:17
badboy	avass: whitout mysql the list of builds is empty?	11:20
avass	badboy: yes, or at least that's my experience	11:20
jkt	badboy: and of course it supports many other SQL servers as well (I'm running it with postgres for example)	11:22
badboy	avass: well that's another thing that's not mentioned in the docs	11:22
*** hashar has quit IRC		11:30
avass	is it possible for nodeset labels to be a list of labels so it requests one node with any of the labels listed?	11:32
*** bhavikdbavishi has quit IRC		11:33
*** hashar has joined #zuul		11:47
*** hashar has quit IRC		11:48
tobiash	avass: afaik no	12:23
tobiash	corvus, mordred: btw, I've understood our timeout problems that happened yesterday. The root cause was io slowness on the nodes causing the hard coded 60s timeout of the setup playbook to exceed.	12:24
tobiash	so no bug in reconfiguration around job timeouts	12:24
tobiash	phew	12:24
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Improve logging around ansible timeouts https://review.openstack.org/633191	12:31
*** quiquell is now known as quiquell\|food		12:41
*** sanjayu_ has joined #zuul		12:43
*** jpena is now known as jpena\|lunch		12:44
*** sanjayu_ has quit IRC		12:53
*** bhavikdbavishi has joined #zuul		13:01
avass	tobiash: alright	13:01
*** hashar has joined #zuul		13:03
*** bjackman_ has joined #zuul		13:04
*** bjackman has quit IRC		13:07
*** bjackman_ has quit IRC		13:07
*** bjackman has joined #zuul		13:13
*** gtema has joined #zuul		13:17
*** bjackman has quit IRC		13:18
*** quiquell\|food is now known as quiquell		13:27
*** rlandy has joined #zuul		13:31
*** bjackman has joined #zuul		13:41
*** jpena\|lunch is now known as jpena		13:49
mordred	tobiash: OH GOOD	13:54
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Make setup playbook timeout configurable https://review.openstack.org/633206	13:54
*** bhavikdbavishi has quit IRC		13:56
*** badboy has quit IRC		14:11
*** gtema has quit IRC		14:17
*** bjackman has quit IRC		14:21
*** gtema has joined #zuul		14:22
pabelanger	think I found an issue using add_host in untrusted job on executor, and I think ansible or zuul is handling wrong exit code: http://paste.openstack.org/show/743470/	14:45
pabelanger	I think what happend here, is I used the wrong SSH public key in authorized_keys on bastion.poctron.xyz, and ansible failed with unreachable	14:45
pabelanger	however return code is -4, which zuul things is a parse error	14:46
pabelanger	this seems to case post-run playbooks not to run, and zuul retries the job, eventually hitting retry_limit but not uploading logs (since post-run doesn't execute)	14:46
pabelanger	not sure the right way forward here, as me being non zuul ops on sf.io, I had to ask jpena for executor logs, but only because I knew something was up.	14:47
pabelanger	mordred: tobiash: corvus: ^thoughts	14:47
pabelanger	s/case/cause	14:48
tobiash	pabelanger: unreachable is treated as infrastructure failure which causes a job retry	14:49
tobiash	so ansible is correct with unreachable and zuul is correct with retry actually	14:49
pabelanger	tobiash: right, but in this case, is wants node nodepool provided, but once I used add_host outside of zuul	14:50
tobiash	maybe the problem is that we may want to still run the post playbooks in case of the last try	14:50
pabelanger	err	14:50
pabelanger	it wasn't node*	14:50
tobiash	yeah, I understand your use case	14:50
pabelanger	I can't seem to remember why we don't run post-run jobs in that case, logs could be helpful	14:51
tobiash	maybe we need a way to still continue (at least on the last try) to at least try to run the post playbooks on unreachable	14:51
pabelanger	let me see why we don't run post-run by default, maybe something in code	14:52
tobiash	because we anticipate an infrastructure failure and if we retry there won't be any reporting about the failed attempt to the user	14:52
mordred	I think it's an optimization	14:52
mordred	because if we got unreachable, the assumption is that the nodepool nodes are broken, so what's the point of trying to run post	14:52
mordred	(clearly post also won't work, because the nodes are unreachable)	14:52
mordred	but use of add_host in untrusted muddies this a bit	14:53
tobiash	and currently only the latest normal run is reported	14:53
pabelanger	yah, this is more a multinode job now, which 1 host is failing. So, possible we could get logs from other to help identify failures	14:53
tobiash	of course add_host gets hard to debug in this case	14:53
mordred	yah	14:53
pabelanger	wonder if we have some job flag to force the post-run playbook, on unreachable. kinda like we do for debug	14:54
pabelanger	or maybe we just enable it for debug flag	14:54
*** gtema has quit IRC		14:55
tobiash	the point is that we only report the last attempt to the user so running the post playbooks for any non-last attempts is wasted time	14:55
tobiash	but maybe we should start thinking about also making the logs of all retries available and then force the execution of the post playbooks	14:56
pabelanger	well, I think there is also the use case of intermittent issues, where first run fails, but 2nd passes. In that case, we don't get any logs and last failure post-run won't really work, but that is outside of this current issue	14:57
tobiash	we could make the earlier retries available in the we ui on the buildset page	14:57
tobiash	also this goes into that direction: https://review.openstack.org/632727	14:57
tobiash	this is intended to improve ways to analyze infrastructure weaknesses that lead to retries	14:58
pabelanger	yes, as an zuul operator, those (retry) logs are helpful. users, maybe not	14:58
tobiash	well, even for the users if they interact with project specific infrastructure in pre playbooks	14:58
pabelanger	tobiash: yay on 632727 will review today, something I've also wanted	14:59
tobiash	so I'd find it very useful if we record the failed builds in the database that lead to the retry	14:59
pabelanger	+1	14:59
tobiash	that would make it possible to spot them in the builds tab too	15:00
tobiash	and if we'd do that, we should probably enforce running the post playbooks to at least try to log something	15:00
tobiash	btw, how does this work? https://zuul.openstack.org/build/5514e07eb6da4b9ea8816ecd45f8357f	15:02
tobiash	just noticed that zuul summarizes the failed task in zuul-web	15:02
avass	how do i override the timeout to indefinate?	15:05
avass	tried setting it to 0 but that just actually set it to 0 hehe	15:05
pabelanger	don't think we support indefinate for timeout	15:05
avass	pabelanger: it's supposed to be indefinate if it's not set	15:06
avass	pabelanger: is there any way to 'unset' it? i guess setting a very long timeout works as well	15:07
pabelanger	avass: where do you see that? indefinate if not set	15:07
*** spsurya has quit IRC		15:07
avass	pabelanger: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.timeout	15:07
pabelanger	ah, I forgot about that	15:08
tobiash	avass: set it to -1	15:08
pabelanger	was just going to suggest that	15:08
tobiash	avass: but you need to also set the max_job_timeout of the tenant to -1	15:08
avass	tobiash: ah	15:09
avass	that explains why that didn't work either	15:09
tobiash	avass: but I'd suggest to use a high but finite timeout, otherwise you will have jobs lingering around forever in case a test case deadlocks	15:10
avass	tobiash: yeah, it's just while I'm testing things while setting everything up anyways	15:10
avass	tobiash, pabelanger: shouldn't the default timeout then be max_job_timeout and not indefinate?	15:13
tobiash	avass: yes: https://review.openstack.org/629552	15:13
tobiash	but I still need to address the comments	15:14
avass	what happens if the timeout is set higher than the limit?	15:24
avass	is it automatically 0 then?	15:24
mordred	tobiash, pabelanger: for the build page, we've also talked about having the executor put the log snippets for final-task-failures into the db and exposing them on that build page	15:25
mordred	it obviously requires us doing the 'db-is-required' first	15:26
tobiash	mordred: it already gets it via js from the log server	15:26
mordred	but I think should make some of these "only admins can look at the build logs to see what went horribly wrong" errors better	15:26
mordred	tobiash: yah - but there has to be a build log for that to work	15:26
tobiash	mordred: ah you mean the buildlog.json directly from the executor?	15:26
tobiash	great	15:27
tobiash	:)	15:27
mordred	we also caputre the ansible output in case of ansible failure into the zuul executor logs - we could grab that text and put it into the db - not for all playbooks, but only if a final playbook fails	15:27
mordred	(capturing all of the ansible output would be a ton of things into the db that would almost always be ignored)	15:27
tobiash	mordred: well we have the job-output.json on the executor too (which is the same thing used now by the web ui)	15:27
mordred	tobiash: indeed. but I'm not sure we get the right thigns in job-output.json in cases where there was a broken ansible invocation or something	15:28
mordred	but yeah ... there's data somewhere that we should be able to capture and collect	15:28
mordred	in those catastrophic cases	15:28
tobiash	for this we have the syntax buffer we put into the buildlog.txt too	15:29
jkt	I wonder why I cannot use a role defined by my parent project directly from a project's .zuul.yaml	15:29
tobiash	that should be easy	15:29
mordred	tobiash: yah	15:29
jkt	I can sidestep that via a "forwarding" job in the parent project	15:29
tobiash	jkt: you can import the roles of the other project	15:29
tobiash	jkt: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.roles	15:30
*** spsurya has joined #zuul		15:30
pabelanger	Yay, got untrusted job working with add_host: https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/34/344d8933e08a208b674451d83af440534bd27590/post/windmill-config-deploy2/6821fe6/ara-report/	15:30
pabelanger	bastion2.yaml is playbook of interest	15:30
tobiash	jkt: example: https://git.zuul-ci.org/cgit/zuul/tree/.zuul.yaml#n66	15:31
tobiash	pabelanger: yay :)	15:31
pabelanger	mordred: tobiash: so, I am a little torn on opening zuul_console ports on production server. Of am I over thinking this? I guess I could firewall and only allow zuul-executors access to it	15:32
pabelanger	maybe I'll just wait until new logging work is finished for that	15:32
pabelanger	since ARA works as expected	15:32
jkt	my usecase: just have a job which runs that child project's ci/build.sh, so something like run-test-command from zuul-jobs, but without inheriting from your 'unittest` job	15:33
jkt	tobiash: thanks	15:33
jkt	I think I'll go with that tiny forwarding job, it's actually a bit shorter	15:33
jkt	tobiash: can I import playbooks from a parent project?	15:41
tobiash	jkt: no, that's not possible	15:42
tobiash	only roles	15:42
pabelanger	yah, usually that case you'd parent to the job with playbooks you'd like to use	15:42
jkt	tobiash: thanks	15:43
jkt	pabelanger: see above for what I'm trying to do; I guess I'll simply copy that playbook into my job, then	15:43
pabelanger	jkt: so, you want to use run-test-command, but not parent to unittests right?	15:46
jkt	pabelanger: yes, that's what I wanted to do	15:47
*** avass has quit IRC		15:47
jkt	pabelanger: but anyway, it's really just a trivial ansible playbook, I just copied it	15:47
pabelanger	yah, I've often wanted to do that myself for reasons, but end up having to copy the playbooks into site specific zuul-jobs.	15:48
*** cmurphy is now known as cmorpheus		15:51
*** quiquell is now known as quiquell\|off		15:56
jkt	why am I getting a DISK_FULL message (zuul.ExecutorDiskAccountant: /tmp/tmpdzo80fgc/21510f7fe3d44b139b5e95e098884ead is using 280MB (limit=250)) like this one: http://paste.openstack.org/show/743480/	16:11
jkt	the disks on the executor and on the assigned node are definitely not full	16:11
jkt	that path does not exist on the executor, anyway (unless someone is playing with mount namespaces...)	16:13
*** bhavikdbavishi has joined #zuul		16:14
jkt	ah, so it's probably due to the size of the git repo itself and https://zuul-ci.org/docs/zuul/admin/components.html#attr-executor.disk_limit_per_job	16:18
*** bhavikdbavishi has quit IRC		16:18
jkt	it's nice that zuul's sources are so easily grepable :)	16:18
corvus	jkt: yes -- one thing to note is that zuul clones repos into the workspaces from its own internal cache, and as long as they are on the same filesystem, it should use hard links, so that much of the git repo data (ie, the blobs) won't count against the history. there's still a lot of local data (including the working directory), so some percentage of the repo will still take up space.	16:22
corvus	er, s/count against the history/count against the limit/	16:22
*** panda is now known as panda\|off		16:23
jkt	corvus: yup, my repo was about 500MB in size	16:24
jkt	now, I'm trying to run a simple shell script on my build nodes; they are fedora 29 provisioned statically by nodepool	16:25
jkt	I keep getting an error "Timeout exception waiting for the logger. Please check connectivity to [147.251.253.10:19885]"	16:25
jkt	what sorts of communication is required to be open? I though that it was all done via ansible over SSH	16:25
corvus	pabelanger, tobiash: we could have the scheduler tell the executor if this is the last retry of a job and have it run post playbooks in that case.	16:26
corvus	jkt: port 19885 needs to be open on the worker nodes in order for log streaming to work	16:26
corvus	jkt: we have a plan to eliminate that requirement, but work is not complete yet.	16:26
jkt	nice, here it is, https://softwarefactory-project.io/docs/operator/nodepool_operator.html#add-a-cloud-provider	16:27
pabelanger	corvus: tobiash: wfm	16:28
corvus	nice. that should probably be in zuul's documentation.	16:28
jkt	corvus: I'll propose a patch for this one	16:29
tobiash	corvus: ++	16:29
*** hashar has quit IRC		16:30
corvus	tobiash: how long does your setup take?	16:37
corvus	tobiash: (and why?)	16:37
tobiash	corvus: we had ceph performance issues so I guess a limit of two minutes would have helped there	16:38
tobiash	I need to dig deeper into the logs to check the usual duration	16:40
corvus	ok, so we probably don't need to update the default. i can see how being able to change the value might be necessary. +2	16:40
tobiash	corvus: :)	16:41
tobiash	corvus: I think we should also add the playbook which is being killed to the timeout log	16:42
corvus	wfm	16:42
corvus	tobiash: (though, you should probably be able to figure that out by looking at previous entries)	16:42
corvus	(but it's fine to add it to make it easier)	16:42
tobiash	yeah, but having it there makes it easy to query kibana for "Ansible timeout exceeded" and make it possible to directly filter for e.g. the setup playbook	16:43
*** bhavikdbavishi has joined #zuul		16:46
corvus	pabelanger: did you ever get to the bottom of the exclude pipeline issue?	16:59
pabelanger	corvus: not yet, I can start looking at it again once i finish converting this job to deploy directly from executor	17:03
pabelanger	hopefull this afternoon	17:03
openstackgerrit	Merged openstack-infra/zuul master: Explicitly callout ZooKeeper as ext dependency https://review.openstack.org/632732	17:06
openstackgerrit	Jan Kundrát proposed openstack-infra/zuul master: Incoming connections over 19885/TCP are needed on nodes https://review.openstack.org/633242	17:11
jkt	what is the most zuul-ish way of checking out git submodules?	17:12
jkt	I'm using them for projects which are available from gerrit (and some of them requiring proper credentials, i.e., not an anonymous access)	17:13
tobiash	jkt: not doing it ;)	17:13
jkt	tobiash: :)	17:13
tobiash	jkt: just kidding	17:13
*** mrhillsman is now known as mrhillsman_lunch		17:13
jkt	I'm afraid they are the least evil thing	17:13
corvus	i wish mordred were around, he just did some stuff with that... let me see if i can dig some stuff up	17:13
tobiash	jkt: zuul doesn't do submodule handling itself	17:13
jkt	I'm mirroring some C projects from github to our gerrit, and they do not have a stable API, so I have to somehow pin them	17:14
jkt	i.e., I cannot use zuul's cross-project tracking, really	17:14
jkt	s/tracking/gating/	17:14
tobiash	what we do is we add the submodules to required-projects and patch the url to the according repo in the workspace and then initializing the submodules	17:14
tobiash	(during job runtime)	17:14
corvus	jkt: since you're mirroring them, could you add your own tags?	17:14
corvus	tobiash: yes, that sounds like what mordred ended up doing too...	17:15
jkt	corvus: my typical workflow involves patching my code in a leaf project to adapt to a new API, and bumping the submodule commit has in .gitmodules	17:15
jkt	corvus: this has worked well for us for years with zuul v2 and turbo-hipster	17:15
tobiash	some of our projects even use branches as moving targets, add speculative submodule states and update the base repo in a post job	17:16
jkt	tobiash: is there a playbook for this auto-checkout? I noticed that the "origin" remote is /dev/null	17:16
tobiash	jkt: the trick is to add them as required-project in the job	17:16
tobiash	then all necessary repos will be synced to the node	17:16
corvus	jkt: here's what morded did with submodules: http://git.openstack.org/cgit/openstack-infra/system-config/tree/.zuul.yaml#n149	17:16
corvus	i think that matches what tobiash is saying	17:16
corvus	but you probably won't use override-checkout	17:17
tobiash	and then you can patch the remote urls of the submodules to the local repos on disk	17:17
jkt	thanks, I think I understand that	17:17
jkt	is there a playbook for this auto-patching? :)	17:17
corvus	you can do that in a pre-playbook	17:17
corvus	morded ended up not patching the urls, but instead, moved the submodule git repos into place: http://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/zuul/gerrit/repos.yaml	17:17
tobiash	we didn't upstream one yet	17:17
tobiash	that's the other possibility	17:18
tobiash	moving is probably more efficient than patching url and initialize	17:18
corvus	(of course, you still need to check out the right sha)	17:18
corvus	(mordred's playbook doesn't do that since we had zuul check out the one we wanted)	17:19
corvus	btw, if one wanted to upstream something, i think one could make that first task, where the repos are moved, generic by iterating over zuul.projects	17:19
corvus	(filtering for required=True)	17:20
corvus	or by reading .gitmodules	17:20
openstackgerrit	Merged openstack-infra/zuul master: Make setup playbook timeout configurable https://review.openstack.org/633206	17:26
openstackgerrit	Merged openstack-infra/zuul master: Improve logging around ansible timeouts https://review.openstack.org/633191	17:26
pabelanger	WOOT!	17:28
pabelanger	https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/a7/a700ecc334444b1102887f429e8125866c8ed1d1/post/windmill-config-deploy/62c4d25/job-output.txt	17:28
pabelanger	actually did CD directly from zuul-executor with untrusted job	17:28
pabelanger	Yay!	17:28
SpamapS	pabelanger: neat	17:29
pabelanger	it is still nested ansible, but add_host totally works	17:29
SpamapS	I think nested ansible is the way to go. Zuul's Ansible just isn't set up for large scale prod deployment.	17:30
SpamapS	We actually don't have any VM's to touch with our CD, just kubernetes and terraform, so a throw-away Zuul VM with secrets to talk to those works great.	17:30
SpamapS	What does not work great is "oh that failed because of X and now we need to retry.. fuuuu"	17:30
pabelanger	Yah, last time I tried, groups of groups was the blocker from running production playbooks from executor	17:30
SpamapS	We have to land pretend commits to retry :-/	17:30
corvus	SpamapS: if you have a minute, i think https://review.openstack.org/623927 is pretty close to being in shape and iirc, you had early thoughts on that.	17:33
SpamapS	oh yay	17:33
corvus	(i think we want to leave it open for more review, so don't +3 it :)	17:33
corvus	maybe i'll send out email early next week with an eye to merging it by the end of the week...	17:33
SpamapS	ACK, (yeah specs I tend to think we need more than just 2x+2's)	17:33
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Revert "Add a timeout for the image build" https://review.openstack.org/633252	17:36
*** jpena is now known as jpena\|off		17:44
*** pvinci has joined #zuul		17:45
pvinci	Hello. Is this the best forum to discuss proposals for changes?	17:47
openstackgerrit	Merged openstack-infra/zuul master: executor: properly format error exception https://review.openstack.org/630928	17:49
tobiash	pvinci: if you mean changes to zuul, then yes :)	17:49
corvus	pvinci: it is a forum to do so, and a good place to start, and at the very least, if a different forum is appropriate, we can figure that out here :)	17:49
pvinci	I ran into an issue with Repo in merger.py failing due to an unknown hostkey. I threw in 10 lines of code to work past it, but it's too trusting or a position to take. I think the gerrit driver should be extended with a 'hostkey' entry for verifying the end host system when using ssh.	17:53
corvus	pvinci: hrm, i thought we had the mergers and executors automatically accept a new host key the first time. and if you want to provide a host key, you can write a .ssh/known_hosts file...	17:55
corvus	pvinci: this code should auto-add the host key if it isn't there already: http://git.zuul-ci.org/cgit/zuul/tree/zuul/merger/merger.py#n109	17:56
*** bhavikdbavishi has quit IRC		17:57
pvinci	That's the code I changed. See the pass in line 122?	17:58
pvinci	if not os.path.exists(path): seen = False with open(path, 'r') as kh: for line in kh.readlines(): if line.startswith(url.hostname): seen = True break if not seen: with open(path, 'a') as kh: kh.write(self.hostkey)	17:58
pvinci	guess that doesn't here. ;)	17:58
corvus	pvinci: you can copy/paste your code into http://paste.openstack.org/ and then copy the resulting url here to share it	17:59
pvinci	http://paste.openstack.org/show/743484/	17:59
corvus	pvinci: but i think i understand your change. what i don't understand is what problem you ran into that you need to solve. did the key change?	18:00
pvinci	git.exc.GitCommandError: Cmd('git') failed due to: exit code(128) cmdline: git clone ssh://xxx /var/lib/zuul/executor-git/xxx stderr: 'Cloning into '/var/lib/zuul/executor-git/xxx'... Warning: Identity file /var/ssh/xxx not accessible: No such file or directory. Host key verification failed. fatal: Could not read from remote repository.	18:05
corvus	pvinci: is there a log line that precedes that which starts with "Unable to set up known_hosts" ?	18:07
pvinci	There is an Exception: Signature verification (ssh-ed25519) failed.	18:09
corvus	pvinci: are you using gerrit or github?	18:12
pvinci	gerrit	18:12
corvus	pvinci: what version?	18:12
pvinci	2.14.6	18:13
corvus	pvinci: you may be running into this bug: https://bugs.chromium.org/p/gerrit/issues/detail?id=6504	18:13
pvinci	Thanks! That is helpful.	18:15
*** mrhillsman_lunch is now known as mrhillsman		18:15
corvus	pvinci: i believe in that case we may not be able to automatically do the right thing in zuul. so i think you are correct to solve this by writing the host key to known_hosts. but i think rather than updating zuul to do this, it may be simpler for you to write the known_hosts file yourself. as long as it's there before zuul starts, it will use it.	18:15
pvinci	Yes. I thought about having ansible drop in the file, but that seemed like the long way around.	18:16
pvinci	I can do that.	18:16
corvus	pvinci: if we were to add host key support to zuul like you suggest, it would be a bit more work since we have to support multiple connections, so we'd really have to manage multiple entries, and the whole lifecycle of the file.	18:16
pvinci	I can add it via ansible. Thanks.	18:17
corvus	pvinci: yeah, i think that "have ansible drop in the file" is probably the way most folks are leaning right now. i could definitely see us going the other way though; it's a tough call. :)	18:17
openstackgerrit	Merged openstack-infra/zuul master: Fix noop job toDict() https://review.openstack.org/630409	18:17
pvinci	Also, how do you feel about doing shallow git clones instead of the full clones we do now?	18:20
corvus	pvinci: a lot of testing requires the full history, or the ability to change branches, so a full clone is the most universally applicable. i'm also not sure that all the speculative merges would succeed or produce the same output without the full clones. however, it's worth noting that as long as the executor's merger and build directories are on the same filesystem, they will use hard links and so the	18:22
corvus	disk space cost is much lower.	18:22
pvinci	ok.	18:26
pvinci	One last thing, then I'll leave you alone for a while. I see this in the logs as well. AttributeError: 'MergeJob' object has no attribute 'updated'.	18:28
corvus	pvinci: it's harmless, we should fix the log line to remove that :)	18:28
pvinci	ok. Thanks. I can contribute that. Just wasn't sure that it wouldn't mask an issue.	18:30
corvus	i think it's just a very old log line which didn't keep up with the changes around it. should be fine to clean up, thanks!	18:30
pvinci	My use case here is a little different. I'm relying on zuul to trigger on specific changes in an upstream project.	18:36
corvus	pvinci: that's great! you're not alone though, several folks here (openstack and openlab come to mind) do that	18:38
tobiash	corvus, pvinci: I've seen this attribute error when there was no merger that responded to the 'cat' request	18:41
pvinci	What is "cat"?	18:41
tobiash	zuul relies on the zuul-executors and mergers to get the configuration from the git repos	18:41
tobiash	it uses gearman jobs to distribute that	18:42
tobiash	and one operation is 'cat' that asks the merger 'give me all zuul.yaml filed from that repo on branch x'	18:42
tobiash	s/filed/files	18:43
corvus	(like "git cat-file")	18:43
tobiash	corvus: what I mean is here: https://git.zuul-ci.org/cgit/zuul/tree/zuul/configloader.py#n1590	18:45
corvus	oh, so we might be masking that error	18:46
tobiash	if that timeouts the job has no updated attribute and the scheduler is doomed	18:46
corvus	same result, wrong error message	18:46
tobiash	yes, we get some stack trace but actually want to know that we couldn't fetch some data from a repo	18:47
tobiash	further we escape from the whole loop then	18:47
pabelanger	I know why, but sometime wish a job could set the timer value for a periodic pipeline	18:51
pabelanger	mordred: when you happen to be around again, I cannot seem to figure out why I am getting 'Waiting on logger' when using add_host from executor. I believe I have proper ports open on firewall: https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/f5/f5d9ac249e1a53f82e8415ddbadde9006122c722/post/windmill-config-deploy/bb7b2c1/job-output.html#l114	18:56
pabelanger	eventually output is rendered, but not realtime, happens in blobs at end of task	18:57
corvus	pabelanger: do you start zuul_console?	18:59
corvus	http://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/prepare-workspace/tasks/main.yaml#n1	18:59
pabelanger	corvus: yup!	18:59
pabelanger	https://github.com/ansible-network/windmill-config/blob/master/tests/playbooks/pre.yaml	19:00
Shrews	This is frustrating. nodepool-builder works locally for me.	19:04
clarkb	could it be an arm specific issue?	19:04
clarkb	nb03 is our arm builder	19:04
Shrews	The obvious thing to look at would be permissions (particularly for the --logfile option to disk-image-create), but that looks ok	19:05
Shrews	clarkb: i don't see how that would make any difference here	19:05
openstackgerrit	Paul Vinciguerra proposed openstack-infra/zuul master: configloader.py: Not all jobs have updated attribute. https://review.openstack.org/633259	19:05
Shrews	it built an arm image as recently as yesterday	19:06
corvus	Shrews: the revert failed on a flaky test, so we have time to change our minds about that. i've rechecked it.	19:06
Shrews	corvus: yeah, let's remove the +A from it so we can poke more	19:07
Shrews	i mean, i'm not sure what else to poke, but...	19:08
*** luizbag has quit IRC		19:08
corvus	Shrews: done. it'll take a while for the check-recheck anyway, hopefully we'll have a zuul+1 ready if we decide to +3 it again.	19:08
Shrews	i could temporarily change it to not use the --logfile option to see if it at least starts to build	19:09
Shrews	think i'll do that	19:09
*** bjackman has joined #zuul		19:12
Shrews	nope, that's not it	19:12
corvus	Shrews: as a sanity check, do you want to do a manual revert on nb03 and make sure that works?	19:13
corvus	Shrews: usually i check out a copy in my homedir at the right commit, then "sudo pip3 install ."	19:13
corvus	oh you may have already done that to do the --logfile thing anyway, huh. nm. :)	19:14
corvus	main point is, if '--logfile' isn't the issue, then i don't know what is and i wonder whether a revert would even fix it.	19:15
Shrews	corvus: i just edited the installed builder.py, didn't checkout from git	19:15
corvus	ah k. probably want to do the pip install thing to test the revert.	19:15
Shrews	corvus: yeah	19:16
Shrews	corvus: yup, revert works	19:20
Shrews	sigh	19:20
Shrews	what the actual fudge	19:20
jkt	I wonder what is the reason behind settings the origin remote's url to /dev/null	19:21
corvus	jkt: because the actual origin (zuul) is not accessible, and the nominal origin (gerrit/github) has the wrong data (it doesn't have the speculative future state that zuul does).	19:22
corvus	(also, the nominal origin may be inacessible too, if it requires credentials)	19:23
jkt	corvus: okay, thanks... I'm thinking about how to implement these submodules I mentioned earlier	19:23
jkt	a project might specify a ref which is not part of the default branch's history, for example, and I think one needs to have origin available to fetch that	19:24
corvus	jkt: the repo on disk should have all of the upstream refs	19:24
corvus	not just the default branch, it should have all of them	19:24
jkt	okay, that's good to know	19:24
corvus	(this is, btw, a difference between zuulv2 and v3 -- v2 didn't always have that)	19:25
pabelanger	I actually ran into that issue recently with post pipeline job, origin was /dev/null now on production server. Okay for now, as going to try and have zuul push the repo over pulling it from github.com	19:26
jkt	so it's really "just" a matter of (for each cloned project) recursively list its submodules, assert that it's a relative URL, and update	19:26
corvus	jkt: that sounds correct to me, but i hide under my desk when anyone says submodules :)	19:26
tobiash	submodules	19:27
* corvus hides		19:28
jkt	corvus: according to the docs, `git submodule XXX` treats the relative URLs as relative to the default remote, i.e., origin	19:28
tobiash	:)	19:28
corvus	tobiash: ^ what do you do to your urls?	19:29
*** bjackman has quit IRC		19:29
tobiash	I'm not directly involved in that project but I believe they patch the submodule urls to be file:///home/zuul/src/<project>	19:32
tobiash	and then just git submodule update --init	19:33
corvus	ah that makes sense. jkt ^	19:33
tobiash	s/patch/patch in a pre-playbook	19:33
corvus	we should write this up and add it to https://zuul-ci.org/docs/zuul/user/howtos.html	19:34
tobiash	good idea	19:34
tobiash	so that's the 'easy' part of submodules	19:34
* corvus hides		19:35
tobiash	lol	19:35
tobiash	we have another project that uses a base repo, gates the submodules and updates the references in a post job	19:36
*** electrofelix has quit IRC		19:37
corvus	i just emitted a long email to zuul-discuss on speculative container images	19:38
corvus	that's item #1 on today's todo list.	19:39
corvus	(writing the email)	19:39
jkt	if I "just" add a call to https://gitpython.readthedocs.io/en/stable/reference.html#git.objects.submodule.root.RootModule.update into zuul/executor/server.py , perhaps guarded by an option (similar to override_*), would that be acceptable upstream?	19:40
tobiash	and I've another advice when using submodules: refrain from using recursive submodules, if possible	19:40
corvus	jkt: i think submodules are too confusing and dangerous for zuul to work with; i'd prefer to get a solid set of shared roles in zuul-jobs to work with them	19:40
jkt	tobiash: :(, we're working with boost.org	19:40
tobiash	jkt: you're doomed ;)	19:41
jkt	corvus: unless the jobs/whatever performs an ACL check within Zuul, there's a nice possibility of repo ACL bypass if the zuul uses a gerrit account that is powerful enough	19:42
tobiash	corvus: yes, that's what I suggest to any projects, if you want to work with submodules, don't try to tell zuul about it, handle it in the jobs and add all submodules to required projects so the respective repos are on disk on thenode	19:42
tobiash	jkt: if you have restrictive acls you should split the projects by tenants in zuul	19:43
jkt	tobiash: and use separate gerrit users in there as well, true	19:43
corvus	jkt: do you mean if we supported submodules in zuul? yes, that's one of the concerns. if you mean now, then what tobiash says -- zuul won't check out projects which aren't in the tenant, so if you don't add that project to that tenant, zuul won't let it be used in required-projects.	19:44
corvus	(which is something we should keep in mind if we add support for implicit required-projects :)	19:44
jkt	corvus: well, a simple implementation of submodules would just call `git submodule update --init` from within a trusted playbook, right?	19:45
jkt	that doesn't check required-projects	19:45
tobiash	jkt: that won't work, as the node has no credentials at all to access gerrit	19:45
Shrews	corvus: i'm stumped. i've output the ENV vars being used, duplicated those in my shell, and ran the command (as nodepool) output in the log and it works. This points to something with the Popen() call, but that works locally for me.	19:46
tobiash	zuul prepares it according to the job and what's allowed to the tenant and pushes the prepared repos to the node	19:46
jkt	tobiash: I meant a pre-playbook, running on the executor/merger/whatever	19:46
tobiash	the job itself has no access to the scm	19:46
tobiash	jkt: even a trusted pre-playbook doesn't have access to the scm	19:46
jkt	ah well	19:46
jkt	that rules out that simple method, then	19:46
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP: support foreign required-projects https://review.openstack.org/613143	19:47
corvus	^ left a note for myself to remember to deal with that case :)	19:47
jkt	so I would have to ask git checkout for a list of submodules, massage their URLs to the already existing checkouts, `git submodule update --init`	19:47
Shrews	though i'm running python 3.6.7 locally, builders have 3.5.2, so maybe a lib difference?	19:47
corvus	Shrews: i can take a look after lunch	19:48
tobiash	jkt: yes	19:48
jkt	that's quite some rewriting, especially if this is to work recursively	19:48
jkt	-> not a job for a Friday evening.	19:48
tobiash	jkt: that's why I said, refrain from recursive submodules ;)	19:48
jkt	ECANNOT :(	19:49
tobiash	then you'll need to do this rewriting recursively	19:49
openstackgerrit	Paul Vinciguerra proposed openstack-infra/zuul master: configloader.py: Not all jobs have an .updated attribute. https://review.openstack.org/633259	19:49
corvus	we should definitely make that a role and put it in zuul-jobs.	19:50
tobiash	jkt: my advice for this: don't try this with distinct standard ansible tasks, use a custom python module	19:50
corvus	we actually have a test framework in zuul-jobs for roles with python modules, so we should be able to get this pretty solid.	19:51
tobiash	the non-recursive would be relatively easy per shell tasks, but for the full blown recursive version I'd prefer a python module and as corvus says using the zuul-jobs test framework :)	19:51
corvus	ug http://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/upload-logs-swift/library/test_zuul_swift_upload.py	19:51
corvus	that was supposed to be "e.g.," not "ug".	19:52
corvus	i'm gonna go get lunch now.	19:52
jkt	thanks a lot for these, I'll be happy to contribute this once I get it working	19:53
jkt	next week :)	19:53
*** spsurya has quit IRC		20:37
*** pvinci has quit IRC		21:06
dmsimard	We're getting close enough to a release of ARA 1.0 that I've started slowly looking at what it would mean for Zuul -- I've created an etherpad to highlight what's new and some ideas to help foster discussion: https://etherpad.openstack.org/p/ara-1.0-in-zuul	21:17
dmsimard	It's still a work in progress and it doesn't have all the answers yet but hopefully it's a good start :p	21:19
dmsimard	I'm excited about the API and some of the other features but I also feel like a lot of that depends on the amount of reliance (or coupling) Zuul is interested in having with ara	21:21
dmsimard	We're friday on the way out so I can send an email to zuul-discuss too if it's appropriate :)	21:22
*** rlandy has quit IRC		21:37
openstackgerrit	Merged openstack-infra/nodepool master: Revert "Add a timeout for the image build" https://review.openstack.org/633252	22:37

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!