*** rlandy has quit IRC | 00:48 | |
pabelanger | I'm looking into project-ssh-key for openstack, but some reason they don't load properly: https://zuul.openstack.org/api/project-ssh-key/openstack-infra/project-config.pub | 02:33 |
---|---|---|
pabelanger | okay, thanks to help from tristanC curl does work | 02:46 |
*** bhavikdbavishi has joined #zuul | 02:47 | |
tristanC | pabelanger: fwiw the url also loads fine with Firefox | 02:50 |
*** bhavikdbavishi has quit IRC | 02:52 | |
*** chandankumar has quit IRC | 02:57 | |
*** chandankumar has joined #zuul | 02:59 | |
*** bhavikdbavishi has joined #zuul | 02:59 | |
*** bjackman has quit IRC | 03:25 | |
pabelanger | tristanC: hmm, so maybe just chrome. I actually get an error: http://paste.openstack.org/show/743437/ | 03:27 |
ianw | how bizarre, i paste that link into my firefox and get a non-working zuul status page | 03:31 |
ianw | but then shift reload it, and get the key | 03:31 |
ianw | some sort of weird caching thing? | 03:31 |
*** bjackman has joined #zuul | 03:34 | |
*** sanjayu_ has joined #zuul | 03:35 | |
pabelanger | refresh doesn't work for me on chrome | 03:38 |
pabelanger | I just see Fetching info... | 03:39 |
*** sanjayu_ has quit IRC | 03:39 | |
ianw | tristanC / pabelanger: yeah -> https://imgur.com/a/jqWjNlO | 03:50 |
ianw | it looks like project-config.pub is served up from a service worker | 03:50 |
ianw | which messes up the app ... when i do a shift-reload it actually grabs it from the remote end | 03:52 |
*** bhavikdbavishi has quit IRC | 04:03 | |
*** bhavikdbavishi has joined #zuul | 04:12 | |
tristanC | ianw: yes that's expected, the service worker doesn't let you query the api directly, and it will try to interpret those url as html5 links | 04:24 |
tristanC | well we may be able to trick it into loading the actual link, but that's not implemented yet | 04:25 |
*** spsurya has joined #zuul | 05:40 | |
*** badboy has joined #zuul | 06:27 | |
*** quiquell|off is now known as quiquell | 06:44 | |
quiquell | ianw, tristanC: We have being testing new zuul version and it breaks zuul-web | 07:28 |
tristanC | quiquell: what's breaking? | 07:28 |
quiquell | tristanC: web UI the API is good | 07:29 |
quiquell | tristanC: Can be our setup, do you know if there are any known issue with that ? | 07:29 |
tristanC | quiquell: we didn't had issue when updating sf-3.2 with zuul-3.5.0, what's your issue? | 07:30 |
*** bjackman has quit IRC | 07:30 | |
quiquell | tristanC: ack, have to check it myself they told me today | 07:32 |
tristanC | there may be an issue if you restart all the service at once, it seems like they tries to create the new artifact table concurrently | 07:33 |
tristanC | we mitigated that in software-factory by manually doing the alembic migration before starting the services | 07:34 |
*** gtema has joined #zuul | 07:38 | |
quiquell | tristanC: humm good to know, let me test myself will go back with more info | 07:39 |
quiquell | panda|off: this is good now ? https://review.rdoproject.org/r/#/c/18475/ | 07:40 |
tristanC | quiquell: btw, have you checked the zuul-runner change, it let you run a job locally without the services, just direct ansible-playbook: https://review.openstack.org/632064 | 07:41 |
quiquell | tristanC: holy sh... really ? | 07:44 |
quiquell | tristanC: we will test that, that would be fantastic, | 07:45 |
quiquell | but the review seems unrelated to that | 07:46 |
quiquell | is the correct one ? | 07:46 |
tristanC | quiquell: zuul-runner (topic:freeze_job) is mentioned on https://tree.taiga.io/project/tripleo-ci-board/epic/5 ... | 07:46 |
tristanC | quiquell: that last review adds --depends-on argument to run local job with speculative change | 07:47 |
tristanC | quiquell: you need the whole patch stack to make it work | 07:47 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: config: add playbooks to job.toDict() https://review.openstack.org/621343 | 07:48 |
quiquell | tristanC: This is exactly what we needed, let's work out next sprints | 07:49 |
quiquell | tristanC: thanks so much | 07:49 |
*** bhavikdbavishi has quit IRC | 07:50 | |
tristanC | quiquell: well we need to groom what is missing, such as secrets substitution map and some sort of embedded nodepool-launcher to manage instances lifecycle | 07:51 |
quiquell | tristanC: but is like the right path | 07:52 |
quiquell | tristanC: we can help with that | 07:52 |
tristanC | quiquell: right now, you need to give the tests instances as command line parameters | 07:52 |
quiquell | tristanC: will test at idle brain cycles | 07:52 |
tristanC | we may deploy the new rest api on rdoproject.org, so that you can directly try the zuul-runner client | 07:53 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: config: add tenant.toDict() method and REST endpoint https://review.openstack.org/621344 | 07:55 |
quiquell | tristanC: man this is fantastic work | 07:55 |
tristanC | quiquell: heh you're welcome, but jhesketh did most of the work | 07:56 |
quiquell | tristanC: would be nice to not need the whole zuul enchilada to run jobs | 07:57 |
*** quiquell is now known as quiquell|brb | 08:01 | |
*** panda|off is now known as panda | 08:04 | |
*** bjackman has joined #zuul | 08:04 | |
*** quiquell|brb is now known as quiquell | 08:19 | |
*** gtema has quit IRC | 08:26 | |
*** bhavikdbavishi has joined #zuul | 08:26 | |
*** mhu has joined #zuul | 08:36 | |
*** avass has joined #zuul | 08:46 | |
avass | does anyone have experience with trying to run programs interactively, preferably under a set session on windows through zuul/ansible? | 08:46 |
*** jpena|off is now known as jpena | 08:51 | |
*** hashar has joined #zuul | 09:09 | |
*** strigazi has quit IRC | 09:15 | |
badboy | I think there's a bug in Nodepool and/or Zuul when installing through pip3 | 09:36 |
*** ssbarnea|bkp2 has joined #zuul | 09:38 | |
badboy | Nodepool installs kubernetes-8.0.1 which installs urllib3-1.24 and chardet-3.0.4 but Zuul needs urllib3-1.22 and chardet-3.0.2 | 09:38 |
badboy | installing kubernetes-8.0.0 and the forcing install of urllib3-1.22 and chardet-3.0.2 resolves the issue | 09:38 |
*** ssbarnea|rover has quit IRC | 09:39 | |
*** ssbarnea|bkp2 has quit IRC | 09:59 | |
*** ssbarnea|rover has joined #zuul | 10:00 | |
*** ssbarnea|rover has quit IRC | 10:01 | |
*** ssbarnea has joined #zuul | 10:01 | |
openstackgerrit | Jean-Philippe Evrard proposed openstack-infra/zuul-jobs master: Allow different filenames for Dockerfiles https://review.openstack.org/632979 | 10:02 |
*** luizbag has joined #zuul | 10:03 | |
*** ssbarnea|bkp2 has joined #zuul | 10:05 | |
*** ssbarnea has quit IRC | 10:07 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul-jobs master: install-nodejs: add support for RPM-based OSes https://review.openstack.org/631049 | 10:38 |
badboy | is it possible to watch only a particular branch in Zuul? | 11:06 |
avass | badboy: something like this https://zuul-ci.org/docs/zuul/admin/drivers/gerrit.html#attr-pipeline.trigger.%3Cgerrit%20source%3E.branch ? | 11:07 |
badboy | avass: probably that's it | 11:09 |
badboy | avass: whats the syntax? pipeline.trigger.my-gerrit-server.my-repo.my-branch? | 11:09 |
avass | badboy: not sure, i got it to work by setting an event branch specific | 11:11 |
badboy | avass: would you mind sharing your config? | 11:11 |
avass | http://paste.openstack.org/show/743453/ | 11:13 |
badboy | avass: thx | 11:14 |
badboy | avass: what's mysql for? | 11:14 |
avass | badboys: it's so it reports to the mysql server which stores build status that's found at <tenant>/builds on the web ui | 11:17 |
badboy | avass: whitout mysql the list of builds is empty? | 11:20 |
avass | badboy: yes, or at least that's my experience | 11:20 |
jkt | badboy: and of course it supports many other SQL servers as well (I'm running it with postgres for example) | 11:22 |
badboy | avass: well that's another thing that's not mentioned in the docs | 11:22 |
*** hashar has quit IRC | 11:30 | |
avass | is it possible for nodeset labels to be a list of labels so it requests one node with any of the labels listed? | 11:32 |
*** bhavikdbavishi has quit IRC | 11:33 | |
*** hashar has joined #zuul | 11:47 | |
*** hashar has quit IRC | 11:48 | |
tobiash | avass: afaik no | 12:23 |
tobiash | corvus, mordred: btw, I've understood our timeout problems that happened yesterday. The root cause was io slowness on the nodes causing the hard coded 60s timeout of the setup playbook to exceed. | 12:24 |
tobiash | so no bug in reconfiguration around job timeouts | 12:24 |
tobiash | *phew* | 12:24 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Improve logging around ansible timeouts https://review.openstack.org/633191 | 12:31 |
*** quiquell is now known as quiquell|food | 12:41 | |
*** sanjayu_ has joined #zuul | 12:43 | |
*** jpena is now known as jpena|lunch | 12:44 | |
*** sanjayu_ has quit IRC | 12:53 | |
*** bhavikdbavishi has joined #zuul | 13:01 | |
avass | tobiash: alright | 13:01 |
*** hashar has joined #zuul | 13:03 | |
*** bjackman_ has joined #zuul | 13:04 | |
*** bjackman has quit IRC | 13:07 | |
*** bjackman_ has quit IRC | 13:07 | |
*** bjackman has joined #zuul | 13:13 | |
*** gtema has joined #zuul | 13:17 | |
*** bjackman has quit IRC | 13:18 | |
*** quiquell|food is now known as quiquell | 13:27 | |
*** rlandy has joined #zuul | 13:31 | |
*** bjackman has joined #zuul | 13:41 | |
*** jpena|lunch is now known as jpena | 13:49 | |
mordred | tobiash: OH GOOD | 13:54 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Make setup playbook timeout configurable https://review.openstack.org/633206 | 13:54 |
*** bhavikdbavishi has quit IRC | 13:56 | |
*** badboy has quit IRC | 14:11 | |
*** gtema has quit IRC | 14:17 | |
*** bjackman has quit IRC | 14:21 | |
*** gtema has joined #zuul | 14:22 | |
pabelanger | think I found an issue using add_host in untrusted job on executor, and I think ansible or zuul is handling wrong exit code: http://paste.openstack.org/show/743470/ | 14:45 |
pabelanger | I think what happend here, is I used the wrong SSH public key in authorized_keys on bastion.poctron.xyz, and ansible failed with unreachable | 14:45 |
pabelanger | however return code is -4, which zuul things is a parse error | 14:46 |
pabelanger | this seems to case post-run playbooks not to run, and zuul retries the job, eventually hitting retry_limit but not uploading logs (since post-run doesn't execute) | 14:46 |
pabelanger | not sure the right way forward here, as me being non zuul ops on sf.io, I had to ask jpena for executor logs, but only because I knew something was up. | 14:47 |
pabelanger | mordred: tobiash: corvus: ^thoughts | 14:47 |
pabelanger | s/case/cause | 14:48 |
tobiash | pabelanger: unreachable is treated as infrastructure failure which causes a job retry | 14:49 |
tobiash | so ansible is correct with unreachable and zuul is correct with retry actually | 14:49 |
pabelanger | tobiash: right, but in this case, is wants node nodepool provided, but once I used add_host outside of zuul | 14:50 |
tobiash | maybe the problem is that we may want to still run the post playbooks in case of the last try | 14:50 |
pabelanger | err | 14:50 |
pabelanger | it wasn't node* | 14:50 |
tobiash | yeah, I understand your use case | 14:50 |
pabelanger | I can't seem to remember why we don't run post-run jobs in that case, logs could be helpful | 14:51 |
tobiash | maybe we need a way to still continue (at least on the last try) to at least try to run the post playbooks on unreachable | 14:51 |
pabelanger | let me see why we don't run post-run by default, maybe something in code | 14:52 |
tobiash | because we anticipate an infrastructure failure and if we retry there won't be any reporting about the failed attempt to the user | 14:52 |
mordred | I think it's an optimization | 14:52 |
mordred | because if we got unreachable, the assumption is that the nodepool nodes are broken, so what's the point of trying to run post | 14:52 |
mordred | (clearly post also won't work, because the nodes are unreachable) | 14:52 |
mordred | but use of add_host in untrusted muddies this a bit | 14:53 |
tobiash | and currently only the latest normal run is reported | 14:53 |
pabelanger | yah, this is more a multinode job now, which 1 host is failing. So, possible we could get logs from other to help identify failures | 14:53 |
tobiash | of course add_host gets hard to debug in this case | 14:53 |
mordred | yah | 14:53 |
pabelanger | wonder if we have some job flag to force the post-run playbook, on unreachable. kinda like we do for debug | 14:54 |
pabelanger | or maybe we just enable it for debug flag | 14:54 |
*** gtema has quit IRC | 14:55 | |
tobiash | the point is that we only report the last attempt to the user so running the post playbooks for any non-last attempts is wasted time | 14:55 |
tobiash | but maybe we should start thinking about also making the logs of all retries available and then force the execution of the post playbooks | 14:56 |
pabelanger | well, I think there is also the use case of intermittent issues, where first run fails, but 2nd passes. In that case, we don't get any logs and last failure post-run won't really work, but that is outside of this current issue | 14:57 |
tobiash | we could make the earlier retries available in the we ui on the buildset page | 14:57 |
tobiash | also this goes into that direction: https://review.openstack.org/632727 | 14:57 |
tobiash | this is intended to improve ways to analyze infrastructure weaknesses that lead to retries | 14:58 |
pabelanger | yes, as an zuul operator, those (retry) logs are helpful. users, maybe not | 14:58 |
tobiash | well, even for the users if they interact with project specific infrastructure in pre playbooks | 14:58 |
pabelanger | tobiash: yay on 632727 will review today, something I've also wanted | 14:59 |
tobiash | so I'd find it very useful if we record the failed builds in the database that lead to the retry | 14:59 |
pabelanger | +1 | 14:59 |
tobiash | that would make it possible to spot them in the builds tab too | 15:00 |
tobiash | and if we'd do that, we should probably enforce running the post playbooks to at least try to log something | 15:00 |
tobiash | btw, how does this work? https://zuul.openstack.org/build/5514e07eb6da4b9ea8816ecd45f8357f | 15:02 |
tobiash | just noticed that zuul summarizes the failed task in zuul-web | 15:02 |
avass | how do i override the timeout to indefinate? | 15:05 |
avass | tried setting it to 0 but that just actually set it to 0 hehe | 15:05 |
pabelanger | don't think we support indefinate for timeout | 15:05 |
avass | pabelanger: it's supposed to be indefinate if it's not set | 15:06 |
avass | pabelanger: is there any way to 'unset' it? i guess setting a very long timeout works as well | 15:07 |
pabelanger | avass: where do you see that? indefinate if not set | 15:07 |
*** spsurya has quit IRC | 15:07 | |
avass | pabelanger: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.timeout | 15:07 |
pabelanger | ah, I forgot about that | 15:08 |
tobiash | avass: set it to -1 | 15:08 |
pabelanger | was just going to suggest that | 15:08 |
tobiash | avass: but you need to also set the max_job_timeout of the tenant to -1 | 15:08 |
avass | tobiash: ah | 15:09 |
avass | that explains why that didn't work either | 15:09 |
tobiash | avass: but I'd suggest to use a high but finite timeout, otherwise you will have jobs lingering around forever in case a test case deadlocks | 15:10 |
avass | tobiash: yeah, it's just while I'm testing things while setting everything up anyways | 15:10 |
avass | tobiash, pabelanger: shouldn't the default timeout then be max_job_timeout and not indefinate? | 15:13 |
tobiash | avass: yes: https://review.openstack.org/629552 | 15:13 |
tobiash | but I still need to address the comments | 15:14 |
avass | what happens if the timeout is set higher than the limit? | 15:24 |
avass | is it automatically 0 then? | 15:24 |
mordred | tobiash, pabelanger: for the build page, we've also talked about having the executor put the log snippets for final-task-failures into the db and exposing them on that build page | 15:25 |
mordred | it obviously requires us doing the 'db-is-required' first | 15:26 |
tobiash | mordred: it already gets it via js from the log server | 15:26 |
mordred | but I think should make some of these "only admins can look at the build logs to see what went horribly wrong" errors better | 15:26 |
mordred | tobiash: yah - but there has to be a build log for that to work | 15:26 |
tobiash | mordred: ah you mean the buildlog.json directly from the executor? | 15:26 |
tobiash | great | 15:27 |
tobiash | :) | 15:27 |
mordred | we also caputre the ansible output in case of ansible failure into the zuul executor logs - we could grab that text and put it into the db - not for all playbooks, but only if a final playbook fails | 15:27 |
mordred | (capturing all of the ansible output would be a ton of things into the db that would almost always be ignored) | 15:27 |
tobiash | mordred: well we have the job-output.json on the executor too (which is the same thing used now by the web ui) | 15:27 |
mordred | tobiash: indeed. but I'm not sure we get the right thigns in job-output.json in cases where there was a broken ansible invocation or something | 15:28 |
mordred | but yeah ... there's data somewhere that we should be able to capture and collect | 15:28 |
mordred | in those catastrophic cases | 15:28 |
tobiash | for this we have the syntax buffer we put into the buildlog.txt too | 15:29 |
jkt | I wonder why I cannot use a role defined by my parent project directly from a project's .zuul.yaml | 15:29 |
tobiash | that should be easy | 15:29 |
mordred | tobiash: yah | 15:29 |
jkt | I can sidestep that via a "forwarding" job in the parent project | 15:29 |
tobiash | jkt: you can import the roles of the other project | 15:29 |
tobiash | jkt: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.roles | 15:30 |
*** spsurya has joined #zuul | 15:30 | |
pabelanger | Yay, got untrusted job working with add_host: https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/34/344d8933e08a208b674451d83af440534bd27590/post/windmill-config-deploy2/6821fe6/ara-report/ | 15:30 |
pabelanger | bastion2.yaml is playbook of interest | 15:30 |
tobiash | jkt: example: https://git.zuul-ci.org/cgit/zuul/tree/.zuul.yaml#n66 | 15:31 |
tobiash | pabelanger: yay :) | 15:31 |
pabelanger | mordred: tobiash: so, I am a little torn on opening zuul_console ports on production server. Of am I over thinking this? I guess I could firewall and only allow zuul-executors access to it | 15:32 |
pabelanger | maybe I'll just wait until new logging work is finished for that | 15:32 |
pabelanger | since ARA works as expected | 15:32 |
jkt | my usecase: just have a job which runs that child project's ci/build.sh, so *something* like run-test-command from zuul-jobs, but without inheriting from your 'unittest` job | 15:33 |
jkt | tobiash: thanks | 15:33 |
jkt | I think I'll go with that tiny forwarding job, it's actually a bit shorter | 15:33 |
jkt | tobiash: can I import playbooks from a parent project? | 15:41 |
tobiash | jkt: no, that's not possible | 15:42 |
tobiash | only roles | 15:42 |
pabelanger | yah, usually that case you'd parent to the job with playbooks you'd like to use | 15:42 |
jkt | tobiash: thanks | 15:43 |
jkt | pabelanger: see above for what I'm trying to do; I guess I'll simply copy that playbook into my job, then | 15:43 |
pabelanger | jkt: so, you want to use run-test-command, but not parent to unittests right? | 15:46 |
jkt | pabelanger: yes, that's what I wanted to do | 15:47 |
*** avass has quit IRC | 15:47 | |
jkt | pabelanger: but anyway, it's really just a trivial ansible playbook, I just copied it | 15:47 |
pabelanger | yah, I've often wanted to do that myself for reasons, but end up having to copy the playbooks into site specific zuul-jobs. | 15:48 |
*** cmurphy is now known as cmorpheus | 15:51 | |
*** quiquell is now known as quiquell|off | 15:56 | |
jkt | why am I getting a DISK_FULL message (zuul.ExecutorDiskAccountant: /tmp/tmpdzo80fgc/21510f7fe3d44b139b5e95e098884ead is using 280MB (limit=250)) like this one: http://paste.openstack.org/show/743480/ | 16:11 |
jkt | the disks on the executor and on the assigned node are definitely not full | 16:11 |
jkt | that path does not exist on the executor, anyway (unless someone is playing with mount namespaces...) | 16:13 |
*** bhavikdbavishi has joined #zuul | 16:14 | |
jkt | ah, so it's probably due to the size of the git repo itself and https://zuul-ci.org/docs/zuul/admin/components.html#attr-executor.disk_limit_per_job | 16:18 |
*** bhavikdbavishi has quit IRC | 16:18 | |
jkt | it's nice that zuul's sources are so easily grepable :) | 16:18 |
corvus | jkt: yes -- one thing to note is that zuul clones repos into the workspaces from its own internal cache, and as long as they are on the same filesystem, it should use hard links, so that much of the git repo data (ie, the blobs) won't count against the history. there's still a lot of local data (including the working directory), so some percentage of the repo will still take up space. | 16:22 |
corvus | er, s/count against the history/count against the limit/ | 16:22 |
*** panda is now known as panda|off | 16:23 | |
jkt | corvus: yup, my repo was about 500MB in size | 16:24 |
jkt | now, I'm trying to run a simple shell script on my build nodes; they are fedora 29 provisioned statically by nodepool | 16:25 |
jkt | I keep getting an error "Timeout exception waiting for the logger. Please check connectivity to [147.251.253.10:19885]" | 16:25 |
jkt | what sorts of communication is required to be open? I though that it was all done via ansible over SSH | 16:25 |
corvus | pabelanger, tobiash: we could have the scheduler tell the executor if this is the last retry of a job and have it run post playbooks in that case. | 16:26 |
corvus | jkt: port 19885 needs to be open on the worker nodes in order for log streaming to work | 16:26 |
corvus | jkt: we have a plan to eliminate that requirement, but work is not complete yet. | 16:26 |
jkt | nice, here it is, https://softwarefactory-project.io/docs/operator/nodepool_operator.html#add-a-cloud-provider | 16:27 |
pabelanger | corvus: tobiash: wfm | 16:28 |
corvus | nice. that should probably be in zuul's documentation. | 16:28 |
jkt | corvus: I'll propose a patch for this one | 16:29 |
tobiash | corvus: ++ | 16:29 |
*** hashar has quit IRC | 16:30 | |
corvus | tobiash: how long does your setup take? | 16:37 |
corvus | tobiash: (and why?) | 16:37 |
tobiash | corvus: we had ceph performance issues so I guess a limit of two minutes would have helped there | 16:38 |
tobiash | I need to dig deeper into the logs to check the usual duration | 16:40 |
corvus | ok, so we probably don't need to update the default. i can see how being able to change the value might be necessary. +2 | 16:40 |
tobiash | corvus: :) | 16:41 |
tobiash | corvus: I think we should also add the playbook which is being killed to the timeout log | 16:42 |
corvus | wfm | 16:42 |
corvus | tobiash: (though, you should probably be able to figure that out by looking at previous entries) | 16:42 |
corvus | (but it's fine to add it to make it easier) | 16:42 |
tobiash | yeah, but having it there makes it easy to query kibana for "Ansible timeout exceeded" and make it possible to directly filter for e.g. the setup playbook | 16:43 |
*** bhavikdbavishi has joined #zuul | 16:46 | |
corvus | pabelanger: did you ever get to the bottom of the exclude pipeline issue? | 16:59 |
pabelanger | corvus: not yet, I can start looking at it again once i finish converting this job to deploy directly from executor | 17:03 |
pabelanger | hopefull this afternoon | 17:03 |
openstackgerrit | Merged openstack-infra/zuul master: Explicitly callout ZooKeeper as ext dependency https://review.openstack.org/632732 | 17:06 |
openstackgerrit | Jan Kundrát proposed openstack-infra/zuul master: Incoming connections over 19885/TCP are needed on nodes https://review.openstack.org/633242 | 17:11 |
jkt | what is the most zuul-ish way of checking out git submodules? | 17:12 |
jkt | I'm using them for projects which are available from gerrit (and some of them requiring proper credentials, i.e., not an anonymous access) | 17:13 |
tobiash | jkt: not doing it ;) | 17:13 |
jkt | tobiash: :) | 17:13 |
tobiash | jkt: just kidding | 17:13 |
*** mrhillsman is now known as mrhillsman_lunch | 17:13 | |
jkt | I'm afraid they are the least evil thing | 17:13 |
corvus | i wish mordred were around, he just did some stuff with that... let me see if i can dig some stuff up | 17:13 |
tobiash | jkt: zuul doesn't do submodule handling itself | 17:13 |
jkt | I'm mirroring some C projects from github to our gerrit, and they do not have a stable API, so I have to somehow pin them | 17:14 |
jkt | i.e., I cannot use zuul's cross-project tracking, really | 17:14 |
jkt | s/tracking/gating/ | 17:14 |
tobiash | what we do is we add the submodules to required-projects and patch the url to the according repo in the workspace and then initializing the submodules | 17:14 |
tobiash | (during job runtime) | 17:14 |
corvus | jkt: since you're mirroring them, could you add your own tags? | 17:14 |
corvus | tobiash: yes, that sounds like what mordred ended up doing too... | 17:15 |
jkt | corvus: my typical workflow involves patching my code in a leaf project to adapt to a new API, *and* bumping the submodule commit has in .gitmodules | 17:15 |
jkt | corvus: this has worked well for us for years with zuul v2 and turbo-hipster | 17:15 |
tobiash | some of our projects even use branches as moving targets, add speculative submodule states and update the base repo in a post job | 17:16 |
jkt | tobiash: is there a playbook for this auto-checkout? I noticed that the "origin" remote is /dev/null | 17:16 |
tobiash | jkt: the trick is to add them as required-project in the job | 17:16 |
tobiash | then all necessary repos will be synced to the node | 17:16 |
corvus | jkt: here's what morded did with submodules: http://git.openstack.org/cgit/openstack-infra/system-config/tree/.zuul.yaml#n149 | 17:16 |
corvus | i think that matches what tobiash is saying | 17:16 |
corvus | but you probably won't use override-checkout | 17:17 |
tobiash | and then you can patch the remote urls of the submodules to the local repos on disk | 17:17 |
jkt | thanks, I think I understand that | 17:17 |
jkt | is there a playbook for this auto-patching? :) | 17:17 |
corvus | you can do that in a pre-playbook | 17:17 |
corvus | morded ended up not patching the urls, but instead, moved the submodule git repos into place: http://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/zuul/gerrit/repos.yaml | 17:17 |
tobiash | we didn't upstream one yet | 17:17 |
tobiash | that's the other possibility | 17:18 |
tobiash | moving is probably more efficient than patching url and initialize | 17:18 |
corvus | (of course, you still need to check out the right sha) | 17:18 |
corvus | (mordred's playbook doesn't do that since we had zuul check out the one we wanted) | 17:19 |
corvus | btw, if one wanted to upstream something, i think one could make that first task, where the repos are moved, generic by iterating over zuul.projects | 17:19 |
corvus | (filtering for required=True) | 17:20 |
corvus | or by reading .gitmodules | 17:20 |
openstackgerrit | Merged openstack-infra/zuul master: Make setup playbook timeout configurable https://review.openstack.org/633206 | 17:26 |
openstackgerrit | Merged openstack-infra/zuul master: Improve logging around ansible timeouts https://review.openstack.org/633191 | 17:26 |
pabelanger | WOOT! | 17:28 |
pabelanger | https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/a7/a700ecc334444b1102887f429e8125866c8ed1d1/post/windmill-config-deploy/62c4d25/job-output.txt | 17:28 |
pabelanger | actually did CD directly from zuul-executor with untrusted job | 17:28 |
pabelanger | Yay! | 17:28 |
SpamapS | pabelanger: neat | 17:29 |
pabelanger | it is still nested ansible, but add_host totally works | 17:29 |
SpamapS | I think nested ansible is the way to go. Zuul's Ansible just isn't set up for large scale prod deployment. | 17:30 |
SpamapS | We actually don't have any VM's to touch with our CD, just kubernetes and terraform, so a throw-away Zuul VM with secrets to talk to those works great. | 17:30 |
SpamapS | What does not work great is "oh that failed because of X and now we need to retry.. fuuuu" | 17:30 |
pabelanger | Yah, last time I tried, groups of groups was the blocker from running production playbooks from executor | 17:30 |
SpamapS | We have to land pretend commits to retry :-/ | 17:30 |
corvus | SpamapS: if you have a minute, i think https://review.openstack.org/623927 is pretty close to being in shape and iirc, you had early thoughts on that. | 17:33 |
SpamapS | oh yay | 17:33 |
corvus | (i think we want to leave it open for more review, so don't +3 it :) | 17:33 |
corvus | maybe i'll send out email early next week with an eye to merging it by the end of the week... | 17:33 |
SpamapS | ACK, (yeah specs I tend to think we need more than just 2x+2's) | 17:33 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Revert "Add a timeout for the image build" https://review.openstack.org/633252 | 17:36 |
*** jpena is now known as jpena|off | 17:44 | |
*** pvinci has joined #zuul | 17:45 | |
pvinci | Hello. Is this the best forum to discuss proposals for changes? | 17:47 |
openstackgerrit | Merged openstack-infra/zuul master: executor: properly format error exception https://review.openstack.org/630928 | 17:49 |
tobiash | pvinci: if you mean changes to zuul, then yes :) | 17:49 |
corvus | pvinci: it is a forum to do so, and a good place to start, and at the very least, if a different forum is appropriate, we can figure that out here :) | 17:49 |
pvinci | I ran into an issue with Repo in merger.py failing due to an unknown hostkey. I threw in 10 lines of code to work past it, but it's too trusting or a position to take. I think the gerrit driver should be extended with a 'hostkey' entry for verifying the end host system when using ssh. | 17:53 |
corvus | pvinci: hrm, i thought we had the mergers and executors automatically accept a new host key the first time. and if you want to provide a host key, you can write a .ssh/known_hosts file... | 17:55 |
corvus | pvinci: this code should auto-add the host key if it isn't there already: http://git.zuul-ci.org/cgit/zuul/tree/zuul/merger/merger.py#n109 | 17:56 |
*** bhavikdbavishi has quit IRC | 17:57 | |
pvinci | That's the code I changed. See the pass in line 122? | 17:58 |
pvinci | if not os.path.exists(path): seen = False with open(path, 'r') as kh: for line in kh.readlines(): if line.startswith(url.hostname): seen = True break if not seen: with open(path, 'a') as kh: kh.write(self.hostkey) | 17:58 |
pvinci | guess that doesn't here. ;) | 17:58 |
corvus | pvinci: you can copy/paste your code into http://paste.openstack.org/ and then copy the resulting url here to share it | 17:59 |
pvinci | http://paste.openstack.org/show/743484/ | 17:59 |
corvus | pvinci: but i think i understand your change. what i don't understand is what problem you ran into that you need to solve. did the key change? | 18:00 |
pvinci | git.exc.GitCommandError: Cmd('git') failed due to: exit code(128) cmdline: git clone ssh://xxx /var/lib/zuul/executor-git/xxx stderr: 'Cloning into '/var/lib/zuul/executor-git/xxx'... Warning: Identity file /var/ssh/xxx not accessible: No such file or directory. Host key verification failed. fatal: Could not read from remote repository. | 18:05 |
corvus | pvinci: is there a log line that precedes that which starts with "Unable to set up known_hosts" ? | 18:07 |
pvinci | There is an Exception: Signature verification (ssh-ed25519) failed. | 18:09 |
corvus | pvinci: are you using gerrit or github? | 18:12 |
pvinci | gerrit | 18:12 |
corvus | pvinci: what version? | 18:12 |
pvinci | 2.14.6 | 18:13 |
corvus | pvinci: you may be running into this bug: https://bugs.chromium.org/p/gerrit/issues/detail?id=6504 | 18:13 |
pvinci | Thanks! That is helpful. | 18:15 |
*** mrhillsman_lunch is now known as mrhillsman | 18:15 | |
corvus | pvinci: i believe in that case we may not be able to automatically do the right thing in zuul. so i think you are correct to solve this by writing the host key to known_hosts. but i think rather than updating zuul to do this, it may be simpler for you to write the known_hosts file yourself. as long as it's there before zuul starts, it will use it. | 18:15 |
pvinci | Yes. I thought about having ansible drop in the file, but that seemed like the long way around. | 18:16 |
pvinci | I can do that. | 18:16 |
corvus | pvinci: if we were to add host key support to zuul like you suggest, it would be a bit more work since we have to support multiple connections, so we'd really have to manage multiple entries, and the whole lifecycle of the file. | 18:16 |
pvinci | I can add it via ansible. Thanks. | 18:17 |
corvus | pvinci: yeah, i *think* that "have ansible drop in the file" is probably the way most folks are leaning right now. i could definitely see us going the other way though; it's a tough call. :) | 18:17 |
openstackgerrit | Merged openstack-infra/zuul master: Fix noop job toDict() https://review.openstack.org/630409 | 18:17 |
pvinci | Also, how do you feel about doing shallow git clones instead of the full clones we do now? | 18:20 |
corvus | pvinci: a lot of testing requires the full history, or the ability to change branches, so a full clone is the most universally applicable. i'm also not sure that all the speculative merges would succeed or produce the same output without the full clones. however, it's worth noting that as long as the executor's merger and build directories are on the same filesystem, they will use hard links and so the | 18:22 |
corvus | disk space cost is much lower. | 18:22 |
pvinci | ok. | 18:26 |
pvinci | One last thing, then I'll leave you alone for a while. I see this in the logs as well. AttributeError: 'MergeJob' object has no attribute 'updated'. | 18:28 |
corvus | pvinci: it's harmless, we should fix the log line to remove that :) | 18:28 |
pvinci | ok. Thanks. I can contribute that. Just wasn't sure that it wouldn't mask an issue. | 18:30 |
corvus | i think it's just a very old log line which didn't keep up with the changes around it. should be fine to clean up, thanks! | 18:30 |
pvinci | My use case here is a little different. I'm relying on zuul to trigger on specific changes in an upstream project. | 18:36 |
corvus | pvinci: that's great! you're not alone though, several folks here (openstack and openlab come to mind) do that | 18:38 |
tobiash | corvus, pvinci: I've seen this attribute error when there was no merger that responded to the 'cat' request | 18:41 |
pvinci | What is "cat"? | 18:41 |
tobiash | zuul relies on the zuul-executors and mergers to get the configuration from the git repos | 18:41 |
tobiash | it uses gearman jobs to distribute that | 18:42 |
tobiash | and one operation is 'cat' that asks the merger 'give me all zuul.yaml filed from that repo on branch x' | 18:42 |
tobiash | s/filed/files | 18:43 |
corvus | (like "git cat-file") | 18:43 |
tobiash | corvus: what I mean is here: https://git.zuul-ci.org/cgit/zuul/tree/zuul/configloader.py#n1590 | 18:45 |
corvus | oh, so we might be masking that error | 18:46 |
tobiash | if that timeouts the job has no updated attribute and the scheduler is doomed | 18:46 |
corvus | same result, wrong error message | 18:46 |
tobiash | yes, we get some stack trace but actually want to know that we couldn't fetch some data from a repo | 18:47 |
tobiash | further we escape from the whole loop then | 18:47 |
pabelanger | I know why, but sometime wish a job could set the timer value for a periodic pipeline | 18:51 |
pabelanger | mordred: when you happen to be around again, I cannot seem to figure out why I am getting 'Waiting on logger' when using add_host from executor. I believe I have proper ports open on firewall: https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/f5/f5d9ac249e1a53f82e8415ddbadde9006122c722/post/windmill-config-deploy/bb7b2c1/job-output.html#l114 | 18:56 |
pabelanger | eventually output is rendered, but not realtime, happens in blobs at end of task | 18:57 |
corvus | pabelanger: do you start zuul_console? | 18:59 |
corvus | http://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/prepare-workspace/tasks/main.yaml#n1 | 18:59 |
pabelanger | corvus: yup! | 18:59 |
pabelanger | https://github.com/ansible-network/windmill-config/blob/master/tests/playbooks/pre.yaml | 19:00 |
Shrews | This is frustrating. nodepool-builder works locally for me. | 19:04 |
clarkb | could it be an arm specific issue? | 19:04 |
clarkb | nb03 is our arm builder | 19:04 |
Shrews | The obvious thing to look at would be permissions (particularly for the --logfile option to disk-image-create), but that looks ok | 19:05 |
Shrews | clarkb: i don't see how that would make any difference here | 19:05 |
openstackgerrit | Paul Vinciguerra proposed openstack-infra/zuul master: configloader.py: Not all jobs have updated attribute. https://review.openstack.org/633259 | 19:05 |
Shrews | it built an arm image as recently as yesterday | 19:06 |
corvus | Shrews: the revert failed on a flaky test, so we have time to change our minds about that. i've rechecked it. | 19:06 |
Shrews | corvus: yeah, let's remove the +A from it so we can poke more | 19:07 |
Shrews | i mean, i'm not sure what else to poke, but... | 19:08 |
*** luizbag has quit IRC | 19:08 | |
corvus | Shrews: done. it'll take a while for the check-recheck anyway, hopefully we'll have a zuul+1 ready if we decide to +3 it again. | 19:08 |
Shrews | i could temporarily change it to not use the --logfile option to see if it at least starts to build | 19:09 |
Shrews | think i'll do that | 19:09 |
*** bjackman has joined #zuul | 19:12 | |
Shrews | nope, that's not it | 19:12 |
corvus | Shrews: as a sanity check, do you want to do a manual revert on nb03 and make sure that works? | 19:13 |
corvus | Shrews: usually i check out a copy in my homedir at the right commit, then "sudo pip3 install ." | 19:13 |
corvus | oh you may have already done that to do the --logfile thing anyway, huh. nm. :) | 19:14 |
corvus | main point is, if '--logfile' isn't the issue, then i don't know what is and i wonder whether a revert would even fix it. | 19:15 |
Shrews | corvus: i just edited the installed builder.py, didn't checkout from git | 19:15 |
corvus | ah k. probably want to do the pip install thing to test the revert. | 19:15 |
Shrews | corvus: yeah | 19:16 |
Shrews | corvus: yup, revert works | 19:20 |
Shrews | *sigh* | 19:20 |
Shrews | what the actual fudge | 19:20 |
jkt | I wonder what is the reason behind settings the origin remote's url to /dev/null | 19:21 |
corvus | jkt: because the actual origin (zuul) is not accessible, and the nominal origin (gerrit/github) has the wrong data (it doesn't have the speculative future state that zuul does). | 19:22 |
corvus | (also, the nominal origin may be inacessible too, if it requires credentials) | 19:23 |
jkt | corvus: okay, thanks... I'm thinking about how to implement these submodules I mentioned earlier | 19:23 |
jkt | a project might specify a ref which is not part of the default branch's history, for example, and I think one needs to have origin available to fetch that | 19:24 |
corvus | jkt: the repo on disk should have all of the upstream refs | 19:24 |
corvus | not just the default branch, it should have all of them | 19:24 |
jkt | okay, that's good to know | 19:24 |
corvus | (this is, btw, a difference between zuulv2 and v3 -- v2 didn't always have that) | 19:25 |
pabelanger | I actually ran into that issue recently with post pipeline job, origin was /dev/null now on production server. Okay for now, as going to try and have zuul push the repo over pulling it from github.com | 19:26 |
jkt | so it's really "just" a matter of (for each cloned project) recursively list its submodules, assert that it's a relative URL, and update | 19:26 |
corvus | jkt: that sounds correct to me, but i hide under my desk when anyone says submodules :) | 19:26 |
tobiash | submodules | 19:27 |
* corvus hides | 19:28 | |
jkt | corvus: according to the docs, `git submodule XXX` treats the relative URLs as relative to the default remote, i.e., origin | 19:28 |
tobiash | :) | 19:28 |
corvus | tobiash: ^ what do you do to your urls? | 19:29 |
*** bjackman has quit IRC | 19:29 | |
tobiash | I'm not directly involved in that project but I believe they patch the submodule urls to be file:///home/zuul/src/<project> | 19:32 |
tobiash | and then just git submodule update --init | 19:33 |
corvus | ah that makes sense. jkt ^ | 19:33 |
tobiash | s/patch/patch in a pre-playbook | 19:33 |
corvus | we should write this up and add it to https://zuul-ci.org/docs/zuul/user/howtos.html | 19:34 |
tobiash | good idea | 19:34 |
tobiash | so that's the 'easy' part of submodules | 19:34 |
* corvus hides | 19:35 | |
tobiash | lol | 19:35 |
tobiash | we have another project that uses a base repo, gates the submodules and updates the references in a post job | 19:36 |
*** electrofelix has quit IRC | 19:37 | |
corvus | i just emitted a long email to zuul-discuss on speculative container images | 19:38 |
corvus | that's item #1 on today's todo list. | 19:39 |
corvus | (writing the email) | 19:39 |
jkt | if I "just" add a call to https://gitpython.readthedocs.io/en/stable/reference.html#git.objects.submodule.root.RootModule.update into zuul/executor/server.py , perhaps guarded by an option (similar to override_*), would that be acceptable upstream? | 19:40 |
tobiash | and I've another advice when using submodules: refrain from using recursive submodules, if possible | 19:40 |
corvus | jkt: i think submodules are too confusing and dangerous for zuul to work with; i'd prefer to get a solid set of shared roles in zuul-jobs to work with them | 19:40 |
jkt | tobiash: :(, we're working with boost.org | 19:40 |
tobiash | jkt: you're doomed ;) | 19:41 |
jkt | corvus: unless the jobs/whatever performs an ACL check within Zuul, there's a nice possibility of repo ACL bypass if the zuul uses a gerrit account that is powerful enough | 19:42 |
tobiash | corvus: yes, that's what I suggest to any projects, if you want to work with submodules, don't try to tell zuul about it, handle it in the jobs and add all submodules to required projects so the respective repos are on disk on thenode | 19:42 |
tobiash | jkt: if you have restrictive acls you should split the projects by tenants in zuul | 19:43 |
jkt | tobiash: and use separate gerrit users in there as well, true | 19:43 |
corvus | jkt: do you mean if we supported submodules in zuul? yes, that's one of the concerns. if you mean now, then what tobiash says -- zuul won't check out projects which aren't in the tenant, so if you don't add that project to that tenant, zuul won't let it be used in required-projects. | 19:44 |
corvus | (which is something we should keep in mind if we add support for implicit required-projects :) | 19:44 |
jkt | corvus: well, a simple implementation of submodules would just call `git submodule update --init` from within a trusted playbook, right? | 19:45 |
jkt | that doesn't check required-projects | 19:45 |
tobiash | jkt: that won't work, as the node has no credentials at all to access gerrit | 19:45 |
Shrews | corvus: i'm stumped. i've output the ENV vars being used, duplicated those in my shell, and ran the command (as nodepool) output in the log and it works. This points to something with the Popen() call, but that works locally for me. | 19:46 |
tobiash | zuul prepares it according to the job and what's allowed to the tenant and pushes the prepared repos to the node | 19:46 |
jkt | tobiash: I meant a pre-playbook, running on the executor/merger/whatever | 19:46 |
tobiash | the job itself has no access to the scm | 19:46 |
tobiash | jkt: even a trusted pre-playbook doesn't have access to the scm | 19:46 |
jkt | ah well | 19:46 |
jkt | that rules out that simple method, then | 19:46 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: support foreign required-projects https://review.openstack.org/613143 | 19:47 |
corvus | ^ left a note for myself to remember to deal with that case :) | 19:47 |
jkt | so I would *have* to ask git checkout for a list of submodules, massage their URLs to the already existing checkouts, `git submodule update --init` | 19:47 |
Shrews | though i'm running python 3.6.7 locally, builders have 3.5.2, so maybe a lib difference? | 19:47 |
corvus | Shrews: i can take a look after lunch | 19:48 |
tobiash | jkt: yes | 19:48 |
jkt | that's quite some rewriting, especially if this is to work recursively | 19:48 |
jkt | -> not a job for a Friday evening. | 19:48 |
tobiash | jkt: that's why I said, refrain from recursive submodules ;) | 19:48 |
jkt | ECANNOT :( | 19:49 |
tobiash | then you'll need to do this rewriting recursively | 19:49 |
openstackgerrit | Paul Vinciguerra proposed openstack-infra/zuul master: configloader.py: Not all jobs have an .updated attribute. https://review.openstack.org/633259 | 19:49 |
corvus | we should definitely make that a role and put it in zuul-jobs. | 19:50 |
tobiash | jkt: my advice for this: don't try this with distinct standard ansible tasks, use a custom python module | 19:50 |
corvus | we actually have a test framework in zuul-jobs for roles with python modules, so we should be able to get this pretty solid. | 19:51 |
tobiash | the non-recursive would be relatively easy per shell tasks, but for the full blown recursive version I'd prefer a python module and as corvus says using the zuul-jobs test framework :) | 19:51 |
corvus | ug http://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/upload-logs-swift/library/test_zuul_swift_upload.py | 19:51 |
corvus | that was supposed to be "e.g.," not "ug". | 19:52 |
corvus | i'm gonna go get lunch now. | 19:52 |
jkt | thanks a lot for these, I'll be happy to contribute this once I get it working | 19:53 |
jkt | next week :) | 19:53 |
*** spsurya has quit IRC | 20:37 | |
*** pvinci has quit IRC | 21:06 | |
dmsimard | We're getting close enough to a release of ARA 1.0 that I've started slowly looking at what it would mean for Zuul -- I've created an etherpad to highlight what's new and some ideas to help foster discussion: https://etherpad.openstack.org/p/ara-1.0-in-zuul | 21:17 |
dmsimard | It's still a work in progress and it doesn't have all the answers yet but hopefully it's a good start :p | 21:19 |
dmsimard | I'm excited about the API and some of the other features but I also feel like a lot of that depends on the amount of reliance (or coupling) Zuul is interested in having with ara | 21:21 |
dmsimard | We're friday on the way out so I can send an email to zuul-discuss too if it's appropriate :) | 21:22 |
*** rlandy has quit IRC | 21:37 | |
openstackgerrit | Merged openstack-infra/nodepool master: Revert "Add a timeout for the image build" https://review.openstack.org/633252 | 22:37 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!