Friday, 2019-01-25

*** rlandy has quit IRC00:48
pabelangerI'm looking into project-ssh-key for openstack, but some reason they don't load properly: https://zuul.openstack.org/api/project-ssh-key/openstack-infra/project-config.pub02:33
pabelangerokay, thanks to help from tristanC curl does work02:46
*** bhavikdbavishi has joined #zuul02:47
tristanCpabelanger: fwiw the url also loads fine with Firefox02:50
*** bhavikdbavishi has quit IRC02:52
*** chandankumar has quit IRC02:57
*** chandankumar has joined #zuul02:59
*** bhavikdbavishi has joined #zuul02:59
*** bjackman has quit IRC03:25
pabelangertristanC: hmm, so maybe just chrome. I actually get an error: http://paste.openstack.org/show/743437/03:27
ianwhow bizarre, i paste that link into my firefox and get a non-working zuul status page03:31
ianwbut then shift reload it, and get the key03:31
ianwsome sort of weird caching thing?03:31
*** bjackman has joined #zuul03:34
*** sanjayu_ has joined #zuul03:35
pabelangerrefresh doesn't work for me on chrome03:38
pabelangerI just see Fetching info...03:39
*** sanjayu_ has quit IRC03:39
ianwtristanC / pabelanger: yeah -> https://imgur.com/a/jqWjNlO03:50
ianwit looks like project-config.pub is served up from a service worker03:50
ianwwhich messes up the app ... when i do a shift-reload it actually grabs it from the remote end03:52
*** bhavikdbavishi has quit IRC04:03
*** bhavikdbavishi has joined #zuul04:12
tristanCianw: yes that's expected, the service worker doesn't let you query the api directly, and it will try to interpret those url as html5 links04:24
tristanCwell we may be able to trick it into loading the actual link, but that's not implemented yet04:25
*** spsurya has joined #zuul05:40
*** badboy has joined #zuul06:27
*** quiquell|off is now known as quiquell06:44
quiquellianw, tristanC: We have being testing new zuul version and it breaks zuul-web07:28
tristanCquiquell: what's breaking?07:28
quiquelltristanC: web UI the API is good07:29
quiquelltristanC: Can be our setup, do you know if there are any known issue with that ?07:29
tristanCquiquell: we didn't had issue when updating sf-3.2 with zuul-3.5.0, what's your issue?07:30
*** bjackman has quit IRC07:30
quiquelltristanC: ack, have to check it myself they told me today07:32
tristanCthere may be an issue if you restart all the service at once, it seems like they tries to create the new artifact table concurrently07:33
tristanCwe mitigated that in software-factory by manually doing the alembic migration before starting the services07:34
*** gtema has joined #zuul07:38
quiquelltristanC: humm good to know, let me test myself will go back with more info07:39
quiquellpanda|off: this is good now ? https://review.rdoproject.org/r/#/c/18475/07:40
tristanCquiquell: btw, have you checked the zuul-runner change, it let you run a job locally without the services, just direct ansible-playbook: https://review.openstack.org/63206407:41
quiquelltristanC: holy sh... really ?07:44
quiquelltristanC: we will test that, that would be fantastic,07:45
quiquellbut the review seems unrelated to that07:46
quiquellis the correct one ?07:46
tristanCquiquell: zuul-runner (topic:freeze_job) is mentioned on https://tree.taiga.io/project/tripleo-ci-board/epic/5 ...07:46
tristanCquiquell: that last review adds --depends-on argument to run local job with speculative change07:47
tristanCquiquell: you need the whole patch stack to make it work07:47
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: config: add playbooks to job.toDict()  https://review.openstack.org/62134307:48
quiquelltristanC: This is exactly what we needed, let's work out next sprints07:49
quiquelltristanC: thanks so much07:49
*** bhavikdbavishi has quit IRC07:50
tristanCquiquell: well we need to groom what is missing, such as secrets substitution map and some sort of embedded nodepool-launcher to manage instances lifecycle07:51
quiquelltristanC: but is like the right path07:52
quiquelltristanC: we can help with that07:52
tristanCquiquell: right now, you need to give the tests instances as command line parameters07:52
quiquelltristanC: will test at idle brain cycles07:52
tristanCwe may deploy the new rest api on rdoproject.org, so that you can directly try the zuul-runner client07:53
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: config: add tenant.toDict() method and REST endpoint  https://review.openstack.org/62134407:55
quiquelltristanC: man this is fantastic work07:55
tristanCquiquell: heh you're welcome, but jhesketh did most of the work07:56
quiquelltristanC: would be nice to not need the whole zuul enchilada to run jobs07:57
*** quiquell is now known as quiquell|brb08:01
*** panda|off is now known as panda08:04
*** bjackman has joined #zuul08:04
*** quiquell|brb is now known as quiquell08:19
*** gtema has quit IRC08:26
*** bhavikdbavishi has joined #zuul08:26
*** mhu has joined #zuul08:36
*** avass has joined #zuul08:46
avassdoes anyone have experience with trying to run programs interactively, preferably under a set session on windows through zuul/ansible?08:46
*** jpena|off is now known as jpena08:51
*** hashar has joined #zuul09:09
*** strigazi has quit IRC09:15
badboyI think there's a bug in Nodepool and/or Zuul when installing through pip309:36
*** ssbarnea|bkp2 has joined #zuul09:38
badboyNodepool installs kubernetes-8.0.1 which installs urllib3-1.24 and chardet-3.0.4 but Zuul needs urllib3-1.22 and chardet-3.0.209:38
badboyinstalling kubernetes-8.0.0 and the forcing install of urllib3-1.22 and chardet-3.0.2 resolves the issue09:38
*** ssbarnea|rover has quit IRC09:39
*** ssbarnea|bkp2 has quit IRC09:59
*** ssbarnea|rover has joined #zuul10:00
*** ssbarnea|rover has quit IRC10:01
*** ssbarnea has joined #zuul10:01
openstackgerritJean-Philippe Evrard proposed openstack-infra/zuul-jobs master: Allow different filenames for Dockerfiles  https://review.openstack.org/63297910:02
*** luizbag has joined #zuul10:03
*** ssbarnea|bkp2 has joined #zuul10:05
*** ssbarnea has quit IRC10:07
openstackgerritMatthieu Huin proposed openstack-infra/zuul-jobs master: install-nodejs: add support for RPM-based OSes  https://review.openstack.org/63104910:38
badboyis it possible to watch only a particular branch in Zuul?11:06
avassbadboy: something like this https://zuul-ci.org/docs/zuul/admin/drivers/gerrit.html#attr-pipeline.trigger.%3Cgerrit%20source%3E.branch ?11:07
badboyavass: probably that's it11:09
badboyavass: whats the syntax? pipeline.trigger.my-gerrit-server.my-repo.my-branch?11:09
avassbadboy: not sure, i got it to work by setting an event branch specific11:11
badboyavass: would you mind sharing your config?11:11
avasshttp://paste.openstack.org/show/743453/11:13
badboyavass: thx11:14
badboyavass: what's mysql for?11:14
avassbadboys: it's so it reports to the mysql server which stores build status that's found at <tenant>/builds on the web ui11:17
badboyavass: whitout mysql the list of builds is empty?11:20
avassbadboy: yes, or at least that's my experience11:20
jktbadboy: and of course it supports many other SQL servers as well (I'm running it with postgres for example)11:22
badboyavass: well that's another thing that's not mentioned in the docs11:22
*** hashar has quit IRC11:30
avassis it possible for nodeset labels to be a list of labels so it requests one node with any of the labels listed?11:32
*** bhavikdbavishi has quit IRC11:33
*** hashar has joined #zuul11:47
*** hashar has quit IRC11:48
tobiashavass: afaik no12:23
tobiashcorvus, mordred: btw, I've understood our timeout problems that happened yesterday. The root cause was io slowness on the nodes causing the hard coded 60s timeout of the setup playbook to exceed.12:24
tobiashso no bug in reconfiguration around job timeouts12:24
tobiash*phew*12:24
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Improve logging around ansible timeouts  https://review.openstack.org/63319112:31
*** quiquell is now known as quiquell|food12:41
*** sanjayu_ has joined #zuul12:43
*** jpena is now known as jpena|lunch12:44
*** sanjayu_ has quit IRC12:53
*** bhavikdbavishi has joined #zuul13:01
avasstobiash: alright13:01
*** hashar has joined #zuul13:03
*** bjackman_ has joined #zuul13:04
*** bjackman has quit IRC13:07
*** bjackman_ has quit IRC13:07
*** bjackman has joined #zuul13:13
*** gtema has joined #zuul13:17
*** bjackman has quit IRC13:18
*** quiquell|food is now known as quiquell13:27
*** rlandy has joined #zuul13:31
*** bjackman has joined #zuul13:41
*** jpena|lunch is now known as jpena13:49
mordredtobiash: OH GOOD13:54
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Make setup playbook timeout configurable  https://review.openstack.org/63320613:54
*** bhavikdbavishi has quit IRC13:56
*** badboy has quit IRC14:11
*** gtema has quit IRC14:17
*** bjackman has quit IRC14:21
*** gtema has joined #zuul14:22
pabelangerthink I found an issue using add_host in untrusted job on executor, and I think ansible or zuul is handling wrong exit code: http://paste.openstack.org/show/743470/14:45
pabelangerI think what happend here, is I used the wrong SSH public key in authorized_keys on bastion.poctron.xyz, and ansible failed with unreachable14:45
pabelangerhowever return code is -4, which zuul things is a parse error14:46
pabelangerthis seems to case post-run playbooks not to run, and zuul retries the job, eventually hitting retry_limit but not uploading logs (since post-run doesn't execute)14:46
pabelangernot sure the right way forward here, as me being non zuul ops on sf.io, I had to ask jpena for executor logs, but only because I knew something was up.14:47
pabelangermordred: tobiash: corvus: ^thoughts14:47
pabelangers/case/cause14:48
tobiashpabelanger: unreachable is treated as infrastructure failure which causes a job retry14:49
tobiashso ansible is correct with unreachable and zuul is correct with retry actually14:49
pabelangertobiash: right, but in this case, is wants node nodepool provided, but once I used add_host outside of zuul14:50
tobiashmaybe the problem is that we may want to still run the post playbooks in case of the last try14:50
pabelangererr14:50
pabelangerit wasn't node*14:50
tobiashyeah, I understand your use case14:50
pabelangerI can't seem to remember why we don't run post-run jobs in that case, logs could be helpful14:51
tobiashmaybe we need a way to still continue (at least on the last try) to at least try to run the post playbooks on unreachable14:51
pabelangerlet me see why we don't run post-run by default, maybe something in code14:52
tobiashbecause we anticipate an infrastructure failure and if we retry there won't be any reporting about the failed attempt to the user14:52
mordredI think it's an optimization14:52
mordredbecause if we got unreachable, the assumption is that the nodepool nodes are broken, so what's the point of trying to run post14:52
mordred(clearly post also won't work, because the nodes are unreachable)14:52
mordredbut use of add_host in untrusted muddies this a bit14:53
tobiashand currently only the latest normal run is reported14:53
pabelangeryah, this is more a multinode job now, which 1 host is failing. So, possible we could get logs from other to help identify failures14:53
tobiashof course add_host gets hard to debug in this case14:53
mordredyah14:53
pabelangerwonder if we have some job flag to force the post-run playbook, on unreachable. kinda like we do for debug14:54
pabelangeror maybe we just enable it for debug flag14:54
*** gtema has quit IRC14:55
tobiashthe point is that we only report the last attempt to the user so running the post playbooks for any non-last attempts is wasted time14:55
tobiashbut maybe we should start thinking about also making the logs of all retries available and then force the execution of the post playbooks14:56
pabelangerwell, I think there is also the use case of intermittent issues, where first run fails, but 2nd passes. In that case, we don't get any logs and last failure post-run won't really work, but that is outside of this current issue14:57
tobiashwe could make the earlier retries available in the we ui on the buildset page14:57
tobiashalso this goes into that direction: https://review.openstack.org/63272714:57
tobiashthis is intended to improve ways to analyze infrastructure weaknesses that lead to retries14:58
pabelangeryes, as an zuul operator, those (retry) logs are helpful. users, maybe not14:58
tobiashwell, even for the users if they interact with project specific infrastructure in pre playbooks14:58
pabelangertobiash: yay on 632727 will review today, something I've also wanted14:59
tobiashso I'd find it very useful if we record the failed builds in the database that lead to the retry14:59
pabelanger+114:59
tobiashthat would make it possible to spot them in the builds tab too15:00
tobiashand if we'd do that, we should probably enforce running the post playbooks to at least try to log something15:00
tobiashbtw, how does this work? https://zuul.openstack.org/build/5514e07eb6da4b9ea8816ecd45f8357f15:02
tobiashjust noticed that zuul summarizes the failed task in zuul-web15:02
avasshow do i override the timeout to indefinate?15:05
avasstried setting it to 0 but that just actually set it to 0 hehe15:05
pabelangerdon't think we support indefinate for timeout15:05
avasspabelanger: it's supposed to be indefinate if it's not set15:06
avasspabelanger: is there any way to 'unset' it? i guess setting a very long timeout works as well15:07
pabelangeravass: where do you see that? indefinate if not set15:07
*** spsurya has quit IRC15:07
avasspabelanger: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.timeout15:07
pabelangerah, I forgot about that15:08
tobiashavass: set it to -115:08
pabelangerwas just going to suggest that15:08
tobiashavass: but you need to also set the max_job_timeout of the tenant to -115:08
avasstobiash: ah15:09
avassthat explains why that didn't work either15:09
tobiashavass: but I'd suggest to use a high but finite timeout, otherwise you will have jobs lingering around forever in case a test case deadlocks15:10
avasstobiash: yeah, it's just while I'm testing things while setting everything up anyways15:10
avasstobiash, pabelanger: shouldn't the default timeout then be max_job_timeout and not indefinate?15:13
tobiashavass: yes: https://review.openstack.org/62955215:13
tobiashbut I still need to address the comments15:14
avasswhat happens if the timeout is set higher than the limit?15:24
avassis it automatically 0 then?15:24
mordredtobiash, pabelanger: for the build page, we've also talked about having the executor put the log snippets for final-task-failures into the db and exposing them on that build page15:25
mordredit obviously requires us doing the 'db-is-required' first15:26
tobiashmordred: it already gets it via js from the log server15:26
mordredbut I think should make some of these "only admins can look at the build logs to see what went horribly wrong" errors better15:26
mordredtobiash: yah - but there has to be a build log for that to work15:26
tobiashmordred: ah you mean the buildlog.json directly from the executor?15:26
tobiashgreat15:27
tobiash:)15:27
mordredwe also caputre the ansible output in case of ansible failure into the zuul executor logs - we could grab that text and put it into the db - not for all playbooks, but only if a final playbook fails15:27
mordred(capturing all of the ansible output would be a ton of things into the db that would almost always be ignored)15:27
tobiashmordred: well we have the job-output.json on the executor too (which is the same thing used now by the web ui)15:27
mordredtobiash: indeed. but I'm not sure we get the right thigns in job-output.json in cases where there was a broken ansible invocation or something15:28
mordredbut yeah ... there's data somewhere that we should be able to capture and collect15:28
mordredin those catastrophic cases15:28
tobiashfor this we have the syntax buffer we put into the buildlog.txt too15:29
jktI wonder why I cannot use a role defined by my parent project directly from a project's .zuul.yaml15:29
tobiashthat should be easy15:29
mordredtobiash: yah15:29
jktI can sidestep that via a "forwarding" job in the parent project15:29
tobiashjkt: you can import the roles of the other project15:29
tobiashjkt: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.roles15:30
*** spsurya has joined #zuul15:30
pabelangerYay, got untrusted job working with add_host: https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/34/344d8933e08a208b674451d83af440534bd27590/post/windmill-config-deploy2/6821fe6/ara-report/15:30
pabelangerbastion2.yaml is playbook of interest15:30
tobiashjkt: example: https://git.zuul-ci.org/cgit/zuul/tree/.zuul.yaml#n6615:31
tobiashpabelanger: yay :)15:31
pabelangermordred: tobiash: so, I am a little torn on opening zuul_console ports on production server. Of am I over thinking this?  I guess I could firewall and only allow zuul-executors access to it15:32
pabelangermaybe I'll just wait until new logging work is finished for that15:32
pabelangersince ARA works as expected15:32
jktmy usecase: just have a job which runs that child project's ci/build.sh, so *something* like run-test-command from zuul-jobs, but without inheriting from your 'unittest` job15:33
jkttobiash: thanks15:33
jktI think I'll go with that tiny forwarding job, it's actually a bit shorter15:33
jkttobiash: can I import playbooks from a parent project?15:41
tobiashjkt: no, that's not possible15:42
tobiashonly roles15:42
pabelangeryah, usually that case you'd parent to the job with playbooks you'd like to use15:42
jkttobiash: thanks15:43
jktpabelanger: see above for what I'm trying to do; I guess I'll simply copy that playbook into my job, then15:43
pabelangerjkt: so, you want to use run-test-command, but not parent to unittests right?15:46
jktpabelanger: yes, that's what I wanted to do15:47
*** avass has quit IRC15:47
jktpabelanger: but anyway, it's really just a trivial ansible playbook, I just copied it15:47
pabelangeryah, I've often wanted to do that myself for reasons, but end up having to copy the playbooks into site specific zuul-jobs.15:48
*** cmurphy is now known as cmorpheus15:51
*** quiquell is now known as quiquell|off15:56
jktwhy am I getting a DISK_FULL message (zuul.ExecutorDiskAccountant: /tmp/tmpdzo80fgc/21510f7fe3d44b139b5e95e098884ead is using 280MB (limit=250)) like this one: http://paste.openstack.org/show/743480/16:11
jktthe disks on the executor and on the assigned node are definitely not full16:11
jktthat path does not exist on the executor, anyway (unless someone is playing with mount namespaces...)16:13
*** bhavikdbavishi has joined #zuul16:14
jktah, so it's probably due to the size of the git repo itself and https://zuul-ci.org/docs/zuul/admin/components.html#attr-executor.disk_limit_per_job16:18
*** bhavikdbavishi has quit IRC16:18
jktit's nice that zuul's sources are so easily grepable :)16:18
corvusjkt: yes -- one thing to note is that zuul clones repos into the workspaces from its own internal cache, and as long as they are on the same filesystem, it should use hard links, so that much of the git repo data (ie, the blobs) won't count against the history.  there's still a lot of local data (including the working directory), so some percentage of the repo will still take up space.16:22
corvuser, s/count against the history/count against the limit/16:22
*** panda is now known as panda|off16:23
jktcorvus: yup, my repo was about 500MB in size16:24
jktnow, I'm trying to run a simple shell script on my build nodes; they are fedora 29 provisioned statically by nodepool16:25
jktI keep getting an error "Timeout exception waiting for the logger. Please check connectivity to [147.251.253.10:19885]"16:25
jktwhat sorts of communication is required to be open? I though that it was all done via ansible over SSH16:25
corvuspabelanger, tobiash: we could have the scheduler tell the executor if this is the last retry of a job and have it run post playbooks in that case.16:26
corvusjkt: port 19885 needs to be open on the worker nodes in order for log streaming to work16:26
corvusjkt: we have a plan to eliminate that requirement, but work is not complete yet.16:26
jktnice, here it is, https://softwarefactory-project.io/docs/operator/nodepool_operator.html#add-a-cloud-provider16:27
pabelangercorvus: tobiash: wfm16:28
corvusnice.  that should probably be in zuul's documentation.16:28
jktcorvus: I'll propose a patch for this one16:29
tobiashcorvus: ++16:29
*** hashar has quit IRC16:30
corvustobiash: how long does your setup take?16:37
corvustobiash: (and why?)16:37
tobiashcorvus: we had ceph performance issues so I guess a limit of two minutes would have helped there16:38
tobiashI need to dig deeper into the logs to check the usual duration16:40
corvusok, so we probably don't need to update the default.  i can see how being able to change the value might be necessary.  +216:40
tobiashcorvus: :)16:41
tobiashcorvus: I think  we should also add the playbook which is being killed to the timeout log16:42
corvuswfm16:42
corvustobiash: (though, you should probably be able to figure that out by looking at previous entries)16:42
corvus(but it's fine to add it to make it easier)16:42
tobiashyeah, but having it there makes it easy to query kibana for "Ansible timeout exceeded" and make it possible to directly filter for e.g. the setup playbook16:43
*** bhavikdbavishi has joined #zuul16:46
corvuspabelanger: did you ever get to the bottom of the exclude pipeline issue?16:59
pabelangercorvus: not yet, I can start looking at it again once i finish converting this job to deploy directly from executor17:03
pabelangerhopefull this afternoon17:03
openstackgerritMerged openstack-infra/zuul master: Explicitly callout ZooKeeper as ext dependency  https://review.openstack.org/63273217:06
openstackgerritJan Kundrát proposed openstack-infra/zuul master: Incoming connections over 19885/TCP are needed on nodes  https://review.openstack.org/63324217:11
jktwhat is the most zuul-ish way of checking out git submodules?17:12
jktI'm using them for projects which are available from gerrit (and some of them requiring proper credentials, i.e., not an anonymous access)17:13
tobiashjkt: not doing it ;)17:13
jkttobiash: :)17:13
tobiashjkt: just kidding17:13
*** mrhillsman is now known as mrhillsman_lunch17:13
jktI'm afraid they are the least evil thing17:13
corvusi wish mordred were around, he just did some stuff with that... let me see if i can dig some stuff up17:13
tobiashjkt: zuul doesn't do submodule handling itself17:13
jktI'm mirroring some C projects from github to our gerrit, and they do not have a stable API, so I have to somehow pin them17:14
jkti.e., I cannot use zuul's cross-project tracking, really17:14
jkts/tracking/gating/17:14
tobiashwhat we do is we add the submodules to required-projects and patch the url to the according repo in the workspace and then initializing the submodules17:14
tobiash(during job runtime)17:14
corvusjkt: since you're mirroring them, could you add your own tags?17:14
corvustobiash: yes, that sounds like what mordred ended up doing too...17:15
jktcorvus: my typical workflow involves patching my code in a leaf project to adapt to a new API, *and* bumping the submodule commit has in .gitmodules17:15
jktcorvus: this has worked well for us for years with zuul v2 and turbo-hipster17:15
tobiashsome of our projects even use branches as moving targets, add speculative submodule states and update the base repo in a post job17:16
jkttobiash: is there a playbook for this auto-checkout? I noticed that the "origin" remote is /dev/null17:16
tobiashjkt: the trick is to add them as required-project in the job17:16
tobiashthen all necessary repos will be synced to the node17:16
corvusjkt: here's what morded did with submodules:  http://git.openstack.org/cgit/openstack-infra/system-config/tree/.zuul.yaml#n14917:16
corvusi think that matches what tobiash is saying17:16
corvusbut you probably won't use override-checkout17:17
tobiashand then you can patch the remote urls of the submodules to the local repos on disk17:17
jktthanks, I think I understand that17:17
jktis there a playbook for this auto-patching? :)17:17
corvusyou can do that in a pre-playbook17:17
corvusmorded ended up not patching the urls, but instead, moved the submodule git repos into place: http://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/zuul/gerrit/repos.yaml17:17
tobiashwe didn't upstream one yet17:17
tobiashthat's the other possibility17:18
tobiashmoving is probably more efficient than patching url and initialize17:18
corvus(of course, you still need to check out the right sha)17:18
corvus(mordred's playbook doesn't do that since we had zuul check out the one we wanted)17:19
corvusbtw, if one wanted to upstream something, i think one could make that first task, where the repos are moved, generic by iterating over zuul.projects17:19
corvus(filtering for required=True)17:20
corvusor by reading .gitmodules17:20
openstackgerritMerged openstack-infra/zuul master: Make setup playbook timeout configurable  https://review.openstack.org/63320617:26
openstackgerritMerged openstack-infra/zuul master: Improve logging around ansible timeouts  https://review.openstack.org/63319117:26
pabelangerWOOT!17:28
pabelangerhttps://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/a7/a700ecc334444b1102887f429e8125866c8ed1d1/post/windmill-config-deploy/62c4d25/job-output.txt17:28
pabelangeractually did CD directly from zuul-executor with untrusted job17:28
pabelangerYay!17:28
SpamapSpabelanger: neat17:29
pabelangerit is still nested ansible, but add_host totally works17:29
SpamapSI think nested ansible is the way to go. Zuul's Ansible just isn't set up for large scale prod deployment.17:30
SpamapSWe actually don't have any VM's to touch with our CD, just kubernetes and terraform, so a throw-away Zuul VM with secrets to talk to those works great.17:30
SpamapSWhat does not work great is "oh that failed because of X and now we need to retry.. fuuuu"17:30
pabelangerYah, last time I tried, groups of groups was the blocker from running production playbooks from executor17:30
SpamapSWe have to land pretend commits to retry :-/17:30
corvusSpamapS: if you have a minute, i think https://review.openstack.org/623927 is pretty close to being in shape and iirc, you had early thoughts on that.17:33
SpamapSoh yay17:33
corvus(i think we want to leave it open for more review, so don't +3 it :)17:33
corvusmaybe i'll send out email early next week with an eye to merging it by the end of the week...17:33
SpamapSACK, (yeah specs I tend to think we need more than just 2x+2's)17:33
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Revert "Add a timeout for the image build"  https://review.openstack.org/63325217:36
*** jpena is now known as jpena|off17:44
*** pvinci has joined #zuul17:45
pvinciHello.  Is this the best forum to discuss proposals for changes?17:47
openstackgerritMerged openstack-infra/zuul master: executor: properly format error exception  https://review.openstack.org/63092817:49
tobiashpvinci: if you mean changes to zuul, then yes :)17:49
corvuspvinci: it is a forum to do so, and a good place to start, and at the very least, if a different forum is appropriate, we can figure that out here :)17:49
pvinciI ran into an issue with Repo in merger.py failing due to an unknown hostkey.  I threw in 10 lines of code to work past it, but it's too trusting or a position to take. I think the gerrit driver should be extended with a  'hostkey' entry for verifying the end host system when using ssh.17:53
corvuspvinci: hrm, i thought we had the mergers and executors automatically accept a new host key the first time.  and if you want to provide a host key, you can write a .ssh/known_hosts file...17:55
corvuspvinci: this code should auto-add the host key if it isn't there already: http://git.zuul-ci.org/cgit/zuul/tree/zuul/merger/merger.py#n10917:56
*** bhavikdbavishi has quit IRC17:57
pvinciThat's the code I changed.  See the pass in line 122?17:58
pvinci        if not os.path.exists(path):             seen = False             with open(path, 'r') as kh:                 for line in kh.readlines():                     if line.startswith(url.hostname):                         seen = True                         break             if not seen:                 with open(path, 'a') as kh:                     kh.write(self.hostkey)17:58
pvinciguess that doesn't here. ;)17:58
corvuspvinci: you can copy/paste your code into http://paste.openstack.org/ and then copy the resulting url here to share it17:59
pvincihttp://paste.openstack.org/show/743484/17:59
corvuspvinci: but i think i understand your change.  what i don't understand is what problem you ran into that you need to solve.  did the key change?18:00
pvincigit.exc.GitCommandError: Cmd('git') failed due to: exit code(128)   cmdline: git clone ssh://xxx /var/lib/zuul/executor-git/xxx   stderr: 'Cloning into '/var/lib/zuul/executor-git/xxx'... Warning: Identity file /var/ssh/xxx not accessible: No such file or directory. Host key verification failed. fatal: Could not read from remote repository.18:05
corvuspvinci: is there a log line that precedes that which starts with "Unable to set up known_hosts" ?18:07
pvinciThere is an Exception: Signature verification (ssh-ed25519) failed.18:09
corvuspvinci: are you using gerrit or github?18:12
pvincigerrit18:12
corvuspvinci: what version?18:12
pvinci2.14.618:13
corvuspvinci: you may be running into this bug: https://bugs.chromium.org/p/gerrit/issues/detail?id=650418:13
pvinciThanks!  That is helpful.18:15
*** mrhillsman_lunch is now known as mrhillsman18:15
corvuspvinci: i believe in that case we may not be able to automatically do the right thing in zuul.  so i think you are correct to solve this by writing the host key to known_hosts.  but i think rather than updating zuul to do this, it may be simpler for you to write the known_hosts file yourself.  as long as it's there before zuul starts, it will use it.18:15
pvinciYes. I thought about having ansible drop in the file, but that seemed like the long way around.18:16
pvinciI can do that.18:16
corvuspvinci: if we were to add host key support to zuul like you suggest, it would be a bit more work since we have to support multiple connections, so we'd really have to manage multiple entries, and the whole lifecycle of the file.18:16
pvinciI can add it via ansible.  Thanks.18:17
corvuspvinci: yeah, i *think* that "have ansible drop in the file" is probably the way most folks are leaning right now.  i could definitely see us going the other way though; it's a tough call.  :)18:17
openstackgerritMerged openstack-infra/zuul master: Fix noop job toDict()  https://review.openstack.org/63040918:17
pvinciAlso, how do you feel about doing shallow git clones instead of the full clones we do now?18:20
corvuspvinci: a lot of testing requires the full history, or the ability to change branches, so a full clone is the most universally applicable.  i'm also not sure that all the speculative merges would succeed or produce the same output without the full clones.  however, it's worth noting that as long as the executor's merger and build directories are on the same filesystem, they will use hard links and so the18:22
corvusdisk space cost is much lower.18:22
pvinciok.18:26
pvinciOne last thing, then I'll leave you alone for a while.  I see this in the logs as well. AttributeError: 'MergeJob' object has no attribute 'updated'.18:28
corvuspvinci: it's harmless, we should fix the log line to remove that :)18:28
pvinciok.  Thanks.  I can contribute that.  Just wasn't sure that it wouldn't mask an issue.18:30
corvusi think it's just a very old log line which didn't keep up with the changes around it.  should be fine to clean up, thanks!18:30
pvinciMy use case here is a little different.  I'm relying on zuul to trigger on specific changes in an upstream project.18:36
corvuspvinci: that's great! you're not alone though, several folks here (openstack and openlab come to mind) do that18:38
tobiashcorvus, pvinci: I've seen this attribute error when there was no merger that responded to the 'cat' request18:41
pvinciWhat is "cat"?18:41
tobiashzuul relies on the zuul-executors and mergers to get the configuration from the git repos18:41
tobiashit uses gearman jobs to distribute that18:42
tobiashand one operation is 'cat' that asks the merger 'give me all zuul.yaml filed from that repo on branch x'18:42
tobiashs/filed/files18:43
corvus(like "git cat-file")18:43
tobiashcorvus: what I mean is here: https://git.zuul-ci.org/cgit/zuul/tree/zuul/configloader.py#n159018:45
corvusoh, so we might be masking that error18:46
tobiashif that timeouts the job has no updated attribute and the scheduler is doomed18:46
corvussame result, wrong error message18:46
tobiashyes, we get some stack trace but actually want to know that we couldn't fetch some data from a repo18:47
tobiashfurther we escape from the whole loop then18:47
pabelangerI know why, but sometime wish a job could set the timer value for a periodic pipeline18:51
pabelangermordred: when you happen to be around again, I cannot seem to figure out why I am getting 'Waiting on logger' when using add_host from executor.  I believe I have proper ports open on firewall: https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/f5/f5d9ac249e1a53f82e8415ddbadde9006122c722/post/windmill-config-deploy/bb7b2c1/job-output.html#l11418:56
pabelangereventually output is rendered, but not realtime, happens in blobs at end of task18:57
corvuspabelanger: do you start zuul_console?18:59
corvushttp://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/prepare-workspace/tasks/main.yaml#n118:59
pabelangercorvus: yup!18:59
pabelangerhttps://github.com/ansible-network/windmill-config/blob/master/tests/playbooks/pre.yaml19:00
ShrewsThis is frustrating. nodepool-builder works locally for me.19:04
clarkbcould it be an arm specific issue?19:04
clarkbnb03 is our arm builder19:04
ShrewsThe obvious thing to look at would be permissions (particularly for the --logfile option to disk-image-create), but that looks ok19:05
Shrewsclarkb: i don't see how that would make any difference here19:05
openstackgerritPaul Vinciguerra proposed openstack-infra/zuul master: configloader.py: Not all jobs have updated attribute.  https://review.openstack.org/63325919:05
Shrewsit built an arm image as recently as yesterday19:06
corvusShrews: the revert failed on a flaky test, so we have time to change our minds about that.  i've rechecked it.19:06
Shrewscorvus: yeah, let's remove the +A from it so we can poke more19:07
Shrewsi mean, i'm not sure what else to poke, but...19:08
*** luizbag has quit IRC19:08
corvusShrews: done.  it'll take a while for the check-recheck anyway, hopefully we'll have a zuul+1 ready if we decide to +3 it again.19:08
Shrewsi could temporarily change it to not use the --logfile option to see if it at least starts to build19:09
Shrewsthink i'll do that19:09
*** bjackman has joined #zuul19:12
Shrewsnope, that's not it19:12
corvusShrews: as a sanity check, do you want to do a manual revert on nb03 and make sure that works?19:13
corvusShrews: usually i check out a copy in my homedir at the right commit, then "sudo pip3 install ."19:13
corvusoh you may have already done that to do the --logfile thing anyway, huh.  nm.  :)19:14
corvusmain point is, if '--logfile' isn't the issue, then i don't know what is and i wonder whether a revert would even fix it.19:15
Shrewscorvus: i just edited the installed builder.py, didn't checkout from git19:15
corvusah k.  probably want to do the pip install thing to test the revert.19:15
Shrewscorvus: yeah19:16
Shrewscorvus: yup, revert works19:20
Shrews*sigh*19:20
Shrewswhat the actual fudge19:20
jktI wonder what is the reason behind settings the origin remote's url to /dev/null19:21
corvusjkt: because the actual origin (zuul) is not accessible, and the nominal origin (gerrit/github) has the wrong data (it doesn't have the speculative future state that zuul does).19:22
corvus(also, the nominal origin may be inacessible too, if it requires credentials)19:23
jktcorvus: okay, thanks... I'm thinking about how to implement these submodules I mentioned earlier19:23
jkta project might specify a ref which is not part of the default branch's history, for example, and I think one needs to have origin available to fetch that19:24
corvusjkt: the repo on disk should have all of the upstream refs19:24
corvusnot just the default branch, it should have all of them19:24
jktokay, that's good to know19:24
corvus(this is, btw, a difference between zuulv2 and v3 -- v2 didn't always have that)19:25
pabelangerI actually ran into that issue recently with post pipeline job, origin was /dev/null now on production server. Okay for now, as going to try and have zuul push the repo over pulling it from github.com19:26
jktso it's really "just" a matter of (for each cloned project) recursively list its submodules, assert that it's a relative URL, and update19:26
corvusjkt: that sounds correct to me, but i hide under my desk when anyone says submodules :)19:26
tobiashsubmodules19:27
* corvus hides19:28
jktcorvus: according to the docs, `git submodule XXX` treats the relative URLs as relative to the default remote, i.e., origin19:28
tobiash:)19:28
corvustobiash: ^ what do you do to your urls?19:29
*** bjackman has quit IRC19:29
tobiashI'm not directly involved in that project but I believe they patch the submodule urls to be file:///home/zuul/src/<project>19:32
tobiashand then just git submodule update --init19:33
corvusah that makes sense.  jkt ^19:33
tobiashs/patch/patch in a pre-playbook19:33
corvuswe should write this up and add it to https://zuul-ci.org/docs/zuul/user/howtos.html19:34
tobiashgood idea19:34
tobiashso that's the 'easy' part of submodules19:34
* corvus hides19:35
tobiashlol19:35
tobiashwe have another project that uses a base repo, gates the submodules and updates the references in a post job19:36
*** electrofelix has quit IRC19:37
corvusi just emitted a long email to zuul-discuss on speculative container images19:38
corvusthat's item #1 on today's todo list.19:39
corvus(writing the email)19:39
jktif I "just" add a call to https://gitpython.readthedocs.io/en/stable/reference.html#git.objects.submodule.root.RootModule.update into zuul/executor/server.py , perhaps guarded by an option (similar to override_*), would that be acceptable upstream?19:40
tobiashand I've another advice when using submodules: refrain from using recursive submodules, if possible19:40
corvusjkt: i think submodules are too confusing and dangerous for zuul to work with; i'd prefer to get a solid set of shared roles in zuul-jobs to work with them19:40
jkttobiash: :(, we're working with boost.org19:40
tobiashjkt: you're doomed ;)19:41
jktcorvus: unless the jobs/whatever performs an ACL check within Zuul, there's a nice possibility of repo ACL bypass if the zuul uses a gerrit account that is powerful enough19:42
tobiashcorvus: yes, that's what I suggest to any projects, if you want to work with submodules, don't try to tell zuul about it, handle it in the jobs and add all submodules to required projects so the respective repos are on disk on thenode19:42
tobiashjkt: if you have restrictive acls you should split the projects by tenants in zuul19:43
jkttobiash: and use separate gerrit users in there as well, true19:43
corvusjkt: do you mean if we supported submodules in zuul?  yes, that's one of the concerns.  if you mean now, then what tobiash says -- zuul won't check out projects which aren't in the tenant, so if you don't add that project to that tenant, zuul won't let it be used in required-projects.19:44
corvus(which is something we should keep in mind if we add support for implicit required-projects :)19:44
jktcorvus: well, a simple implementation of submodules would just call `git submodule update --init` from within a trusted playbook, right?19:45
jktthat doesn't check required-projects19:45
tobiashjkt: that won't work, as the node has no credentials at all to access gerrit19:45
Shrewscorvus: i'm stumped. i've output the ENV vars being used, duplicated those in my shell, and ran the command (as nodepool) output in the log and it works. This points to something with the Popen() call, but that works locally for me.19:46
tobiashzuul prepares it according to the job and what's allowed to the tenant and pushes the prepared repos to the node19:46
jkttobiash: I meant a pre-playbook, running on the executor/merger/whatever19:46
tobiashthe job itself has no access to the scm19:46
tobiashjkt: even a trusted pre-playbook doesn't have access to the scm19:46
jktah well19:46
jktthat rules out that simple method, then19:46
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: support foreign required-projects  https://review.openstack.org/61314319:47
corvus^ left a note for myself to remember to deal with that case :)19:47
jktso I would *have* to ask git checkout for a list of submodules, massage their URLs to the already existing checkouts, `git submodule update --init`19:47
Shrewsthough i'm running python 3.6.7 locally, builders have 3.5.2, so maybe a lib difference?19:47
corvusShrews: i can take a look after lunch19:48
tobiashjkt: yes19:48
jktthat's quite some rewriting, especially if this is to work recursively19:48
jkt-> not a job for a Friday evening.19:48
tobiashjkt: that's why I said, refrain from recursive submodules ;)19:48
jktECANNOT :(19:49
tobiashthen you'll need to do this rewriting recursively19:49
openstackgerritPaul Vinciguerra proposed openstack-infra/zuul master: configloader.py: Not all jobs have an .updated attribute.  https://review.openstack.org/63325919:49
corvuswe should definitely make that a role and put it in zuul-jobs.19:50
tobiashjkt: my advice for this: don't try this with distinct standard ansible tasks, use a custom python module19:50
corvuswe actually have a test framework in zuul-jobs for roles with python modules, so we should be able to get this pretty solid.19:51
tobiashthe non-recursive would be relatively easy per shell tasks, but for the full blown recursive version I'd prefer a python module and as corvus says using the zuul-jobs test framework :)19:51
corvusug http://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/upload-logs-swift/library/test_zuul_swift_upload.py19:51
corvusthat was supposed to be "e.g.," not "ug".19:52
corvusi'm gonna go get lunch now.19:52
jktthanks a lot for these, I'll be happy to contribute this once I get it working19:53
jktnext week :)19:53
*** spsurya has quit IRC20:37
*** pvinci has quit IRC21:06
dmsimardWe're getting close enough to a release of ARA 1.0 that I've started slowly looking at what it would mean for Zuul -- I've created an etherpad to highlight what's new and some ideas to help foster discussion: https://etherpad.openstack.org/p/ara-1.0-in-zuul21:17
dmsimardIt's still a work in progress and it doesn't have all the answers yet but hopefully it's a good start :p21:19
dmsimardI'm excited about the API and some of the other features but I also feel like a lot of that depends on the amount of reliance (or coupling) Zuul is interested in having with ara21:21
dmsimardWe're friday on the way out so I can send an email to zuul-discuss too if it's appropriate :)21:22
*** rlandy has quit IRC21:37
openstackgerritMerged openstack-infra/nodepool master: Revert "Add a timeout for the image build"  https://review.openstack.org/63325222:37

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!