*** egustafson has joined #zuul | 00:24 | |
egustafson | Hi folks, i'm using the quickstart guide and added it github integration and also added a static vm as an external worker. when i submit a pr, the event will trigger a build just fine, it will run my plays defined in the .zuul.yaml of the repo, my problem is in the log gathering phase at the end. from the quickstart guide, there is a post play that runs that gathers the logs. before adding an external worker, this was | 00:24 |
---|---|---|
egustafson | working fine. from the quickstart guide, the post play gathers the logs and uses rsync to get them from A to B. It appears the rsync fails on a chown command? does the error reflect chown failing on the executor container? I copied some bits from stdout, after running docker compose up, in hopes it will help isolate where i could troubleshoot next. | 00:24 |
egustafson | 2019-07-20 22:34:20,719 DEBUG zuul.AnsibleJob: [e: 6cdc26c0-ab3e-11e9-925f-fc78411c065e] [build: 4d50713c22a24c9d9639f58ee7ff50fb] Ansible output: b'TASK [fetch-output : Collect log output dest={{ log_path }}/, mode=pull, sr | 00:25 |
egustafson | c={{ zuul_output_dir }}/logs/, verify_host=True] ***' | 00:25 |
egustafson | executor_1 | 2019-07-20 22:34:21,616 DEBUG zuul.AnsibleJob: [e: 6cdc26c0-ab3e-11e9-925f-fc78411c065e] [build: 4d50713c22a24c9d9639f58ee7ff50fb] Ansible output: b'fatal: [1.2.3.4]: FAILED! => {"changed": false, "cmd": "/usr/bin/rsync | 00:25 |
egustafson | --delay-updates -F --compress --archive --rsh=/usr/bin/ssh -S none -o Port=22 --out-format=<<CHANGED>>%i %n%L testuser@1.2.3.4:/home/genadm/zuul-output/logs/ /var/lib/zuul/builds/4d50713c22a24c9d9639f58ee7ff50fb/work/logs/", "msg": "rsync: | 00:25 |
egustafson | chown \\"/var/lib/zuul/builds/4d50713c22a24c9d9639f58ee7ff50fb/work/logs/.\\" failed: Invalid argument (22)\\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1668) [generator=3.1.2]\\n", "rc": 23}$ | 00:25 |
egustafson | executor_1 | 2019-07-20 22:34:21,940 DEBUG zuul.AnsibleJob: [e: 6cdc26c0-ab3e-11e9-925f-fc78411c065e] [build: 4d50713c22a24c9d9639f58ee7ff50fb] cmd: /usr/bin/rsync --delay-updates -F --compress --archive --rsh=/usr/bin/ssh | 00:25 |
egustafson | executor_1 | 2019-07-20 22:34:21,940 DEBUG zuul.AnsibleJob: [e: 6cdc26c0-ab3e-11e9-925f-fc78411c065e] [build: 4d50713c22a24c9d9639f58ee7ff50fb] -S none -o Port=22 --out-format=<<CHANGED>>%i %n%L testuser@1.2.3.4:/home/genadm/ | 00:25 |
egustafson | zuul-output/logs/ | 00:25 |
egustafson | executor_1 | 2019-07-20 22:34:21,943 DEBUG zuul.AnsibleJob: [e: 6cdc26c0-ab3e-11e9-925f-fc78411c065e] [build: 4d50713c22a24c9d9639f58ee7ff50fb] msg: 'rsync: chown "/var/lib/zuul/builds/4d50713c22a24c9d9639f58ee7ff50fb/work/logs/. | 00:26 |
egustafson | " | 00:26 |
fungi | egustafson: i'm not certain, but that may imply that you're lacking an equivalent user/group on the executor to the user/group which owned some of the files being archived from the job node | 00:35 |
fungi | --archive implies --group and --owner preservation options, according to the rsync manpage | 00:37 |
*** sgw has joined #zuul | 00:37 | |
fungi | since the files are in /home/genadm on the job node, maybe they're owned by a genadm user which doesn't exist on your executor? | 00:38 |
egustafson | interesting. in order to hook in the external nodes, it appeared all i needed was the host id on the nodepool conf file and to persist the key of the executor in the authorized_keys on the nodes | 00:38 |
fungi | (or maybe the testuser user owns them and it doesn't exist on the executor?) | 00:38 |
fungi | one workaround would be to make sure there's a zuul user on the nodes (that's how we do the opendev test nodes) and have the job chown/chgrp logs to that before wrapping up | 00:39 |
egustafson | i had changed the ip and username for the sake of posting but appears i missed one. i tried to make it easier to follow. it was all technically genadm where all instances of testuser appear | 00:39 |
fungi | but it also may be that we shouldn't be relying on --archive and should instead have files always be owned on the account used on the executor | 00:40 |
egustafson | is /var/lib/zuul/builds/ a path on the executor container? | 00:42 |
fungi | yes | 00:42 |
fungi | changing the rsync command in the log collection role in zuul-jobs is certainly a broader discussion, but i wouldn't rule out the possibility that it merits being more flexible in the face of file ownership mismatches | 00:43 |
egustafson | not sure if this is expected, there is nothing under /var/lib/zuul/builds | 00:45 |
egustafson | perhaps this gets cleaned up at some point thereafter | 00:45 |
fungi | our executors (granted they're very active) have numerous build uuid named directories under the /var/lib/zuul/builds/ directory | 00:47 |
egustafson | right, i see the uuid in the chown command that failed in the above log. when i exec into the running container and ls -l on /var/lib/zuul/builds/ there are no directories/files | 00:48 |
egustafson | root@5637b4f75568:/# ls -l /var/lib/zuul/builds/ | 00:49 |
egustafson | total 0 | 00:49 |
fungi | and that's the executor container? (not, say, the scheduler container) | 00:51 |
fungi | though yeah, i don't know if it might clean up after errors like that | 00:52 |
corvus | executors clean up the build dirs. you can run "zuul-executor keep" to keep them around for debugging purposes | 00:52 |
egustafson | yes | 00:52 |
egustafson | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | 00:52 |
egustafson | 5637b4f75568 zuul/zuul-executor "/usr/bin/dumb-init …" 3 weeks ago Up 8 hours examples_executor_1 | 00:52 |
corvus | (that's a runtime toggle, just run that in a shell in the executor container after it's already running) | 00:53 |
egustafson | container id aligns with the output | 00:53 |
corvus | and yeah, maybe this just doesn't come up that often if folks use "zuul" users in both places; changing that in fetch-logs may be warranted | 00:56 |
egustafson | good tip on keeping the logs. if there are any suggested tweaks i can make to the rsync command in the role, let me know. | 00:57 |
egustafson | not sure where a 'zuul' user would come from. there is no zuul user on the executor container (does not appear to be) and it did not appear i needed that user when i added the external node. i can add it, just not sure where you would be suggesting | 00:58 |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP download and display a log file https://review.opendev.org/671912 | 00:59 |
egustafson | i'll deeper dive on this monday when i return to the office. sometimes stepping away for a bit will allow me to return with a fresh insight into next steps. i'll circle back for any suggestions to tweak on my end to get the rsync to pass. | 01:03 |
egustafson | thanks folks | 01:03 |
corvus | egustafson: if i'm following fungi's reasoning correctly, then i think creating a zuul user inside the executor container may be a quick fix for this (but obviously, that's impermanent) | 01:04 |
corvus | and the long-term fix is to to alter the fetch-logs role in zuul jobs | 01:05 |
corvus | tristanC, mordred: ^ https://review.opendev.org/671912 is terrible and i would like to talk to you about how it should be structurally reworked at some point. (but -- it's something to point at, and it shows all the functional parts we need working) | 01:06 |
corvus | since it's not quite possible to navigate to the page, here's what it looks like: https://imgur.com/a/pDqF21M | 01:08 |
corvus | (that's a console log fetched async from the logserver and rendered) | 01:09 |
*** rfolco|rover has quit IRC | 01:13 | |
*** bhavikdbavishi has joined #zuul | 04:10 | |
*** altlogbot_2 has quit IRC | 04:56 | |
*** altlogbot_1 has joined #zuul | 04:58 | |
*** bhavikdbavishi has quit IRC | 05:22 | |
*** bjackman has joined #zuul | 05:29 | |
*** bhavikdbavishi has joined #zuul | 05:34 | |
*** bjackman has quit IRC | 05:57 | |
*** dmsimard7 has joined #zuul | 06:35 | |
*** dmsimard has quit IRC | 06:36 | |
*** dmsimard7 is now known as dmsimard | 06:40 | |
*** bjackman has joined #zuul | 06:50 | |
*** bhavikdbavishi has quit IRC | 07:14 | |
*** bhavikdbavishi has joined #zuul | 07:35 | |
*** bhavikdbavishi has quit IRC | 08:45 | |
*** gtema has joined #zuul | 08:57 | |
*** bhavikdbavishi has joined #zuul | 08:58 | |
*** bhavikdbavishi has quit IRC | 09:15 | |
*** armstrongs has quit IRC | 09:18 | |
*** gtema_ has joined #zuul | 09:26 | |
*** gtema has quit IRC | 09:26 | |
*** bhavikdbavishi has joined #zuul | 09:26 | |
*** armstrongs has joined #zuul | 09:27 | |
*** gtema_ has quit IRC | 09:34 | |
*** gtema has joined #zuul | 09:35 | |
*** armstrongs has quit IRC | 09:36 | |
*** bhavikdbavishi has quit IRC | 09:42 | |
*** tosky has joined #zuul | 09:42 | |
*** hwangbo has quit IRC | 10:15 | |
*** gtema has quit IRC | 10:31 | |
*** bhavikdbavishi has joined #zuul | 10:48 | |
*** bhavikdbavishi has quit IRC | 11:15 | |
*** bhavikdbavishi has joined #zuul | 11:29 | |
*** sshnaidm|off is now known as sshnaidm | 12:15 | |
*** bjackman has quit IRC | 13:52 | |
*** bhavikdbavishi has quit IRC | 14:42 | |
*** bhavikdbavishi has joined #zuul | 15:00 | |
*** bjackman has joined #zuul | 15:03 | |
*** bhavikdbavishi has quit IRC | 15:19 | |
*** bhavikdbavishi has joined #zuul | 16:04 | |
*** bhavikdbavishi has quit IRC | 16:27 | |
*** bhavikdbavishi has joined #zuul | 17:07 | |
*** bjackman has quit IRC | 18:43 | |
*** bhavikdbavishi has quit IRC | 19:34 | |
*** mattw4 has joined #zuul | 20:15 | |
*** mattw4 has quit IRC | 21:42 | |
*** altlogbot_1 has quit IRC | 22:08 | |
*** altlogbot_0 has joined #zuul | 22:11 | |
*** tosky has quit IRC | 23:25 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!