tristanC | spredzy: let's just remove the zuul src dir before the job starts, and that's it | 00:25 |
---|---|---|
tristanC | pabelanger: why would you like to redesing job for static node? | 00:27 |
tristanC | pabelanger: static driver is merged in nodepool, are you saying we shouldn't use it?! | 00:27 |
pabelanger | tristanC: right, I'm suggesting we just use a node from nodepool for tox jobs, since that is how we designed them upstream. I am sure we can get them working on a static node, but if awx first moves to a VM, that will unblock the testing with zuul. | 00:36 |
tristanC | spredzy: https://github.com/ansible/zuul-config/pull/33 should fix static node cleanup | 00:45 |
pabelanger | we really should be first proposing changes to base-minimal-test, to avoid merging untested code | 00:49 |
pabelanger | but seems ansible/zuul-config doesn't have a job for that | 00:49 |
pabelanger | https://github.com/ansible/zuul-config/commit/7bb2282039c7d3241f9503916437d7679bb1ffa5#diff-dbd9cd0ce7e9a5628770143f1488ff59L48 | 00:50 |
pabelanger | that should be reverted | 00:50 |
pabelanger | I'll be online tomorrow to chat with awx team. Chat more inthe morning | 00:51 |
matburt | pabelanger tristanC Thanks for yalls help... we're just having some trouble with the checkout dir hanging around between test runs | 00:51 |
*** tristanC has quit IRC | 01:01 | |
*** tristanC has joined #softwarefactory | 01:01 | |
*** nilashishc has joined #softwarefactory | 04:36 | |
*** nilashishc has quit IRC | 06:02 | |
*** nilashishc has joined #softwarefactory | 06:04 | |
*** nilashishc has quit IRC | 06:06 | |
*** nilashishc has joined #softwarefactory | 06:06 | |
tristanC | there are now 7 executors on sf-project.io and we udpated the zuul webui to the new patternfly-react interface | 07:41 |
*** jangutter has quit IRC | 07:55 | |
*** jangutter has joined #softwarefactory | 07:55 | |
*** sshnaidm|afk is now known as sshnaidm | 08:29 | |
spredzy | tristanC: looks good with the new patternfly-react interface :) | 08:35 |
tristanC | spredzy: shouldn't we give https://github.com/ansible/zuul-config/pull/33/files a try? (e.g. merge, check if it works, revert if not, move-on if it worked...) | 09:05 |
spredzy | wait, let me put it on WIP. Waiting to hear back from pabelanger on an alternative approach today | 09:09 |
spredzy | Oopsie, that not my PR but yours :) | 09:10 |
spredzy | tristanC: but we can still go with it - just to prove its working. I mean even in non-static, worst case scenario the '{{ ansible_user_dir }}/src' doesn't exist anymore | 09:11 |
spredzy | s/anymore// | 09:11 |
*** zoli is now known as zoli|afk | 09:12 | |
*** zoli|afk is now known as zoli | 09:12 | |
spredzy | tristanC: I'll merge it | 09:13 |
tristanC | spredzy: i think that's fair | 09:13 |
ganeshrn | https://softwarefactory-project.io/r/#/c/13885/ <-- can someone please review and merge this PR | 09:14 |
* tristanC waiting for a recheck | 09:14 | |
tristanC | ganeshrn: done | 09:15 |
ganeshrn | tristanC: cool, thanks! | 09:15 |
tristanC | spredzy: seems to be good, 2018-10-09 09:15:54.352773 | TASK [Clean workspace] static | changed | 09:16 |
tristanC | spredzy: i rechecked a runc job to make sure base still works | 09:17 |
sfbender | Merged software-factory/sf-config master: zuul: adapt gateway rewrite for React interface https://softwarefactory-project.io/r/13727 | 09:17 |
tristanC | spredzy: all good, container | ok | 09:18 |
tristanC | spredzy: oops, actually: https://ansible.softwarefactory-project.io/logs/66/2266/d68989bfd8f34a572f383d42daaf22aaf037d2aa/check/tox-awx-ui/ddd2d79/log-classify.html | 09:19 |
spredzy | tristanC: we can use become: True at that stage | 09:19 |
spredzy | revoke-sudo hasn't been called yet | 09:20 |
spredzy | if I am not mistaken | 09:20 |
tristanC | spredzy: hum, but can zuul become on static node? revoke-sudo would be permanent... | 09:20 |
spredzy | correct sorry | 09:21 |
tristanC | if that's the case, then those files are left-over which needs to be manually cleaned | 09:22 |
spredzy | tristanC: can we not revoke-sudo ? | 09:22 |
tristanC | spredzy: yes sure, but then zuul won't be able to sudo on subsequent run... | 09:22 |
spredzy | at least until we work out why those files are root and have the necessary change in to make then zuul:zuul | 09:23 |
sfbender | Merged software-factory/sf-ci master: zuul: fix gateway test for React interface https://softwarefactory-project.io/r/13728 | 09:23 |
spredzy | tristanC: not following | 09:23 |
tristanC | spredzy: what's the owner of 35.230.187.160:/home/zuul/src/github.com/ansible/awx/awx.egg-info/PKG-INFO ? | 09:23 |
spredzy | if we don't revoke-sudo, so basically leave zuul in the sudoers file | 09:23 |
spredzy | what happen with the subsequent run? | 09:23 |
tristanC | spredzy: if you revoke sudo, then the zuul user of the static node won't be able to sudo | 09:23 |
spredzy | I said "not revoking" it | 09:23 |
spredzy | so zuul can always become: True | 09:24 |
spredzy | until we manage file permissions properly | 09:24 |
tristanC | spredzy: hum, then you need that job: https://review.openstack.org/#/c/593150/ | 09:25 |
tristanC | spredzy: but you shouldn't let zuul sudo on a static node, it should be restricted to only user command, since sudo can easily left persisting stuff... | 09:26 |
spredzy | Agree, but currently our CI is blocked because of tests currently not passing. What I'd like to get is test passing so CI is unlocked, and then work to make it clean (proper permission, new base-job if necessary, ...) | 09:27 |
spredzy | So revoke-sudo should be put back by end of week | 09:27 |
tristanC | spredzy: in https://github.com/ansible/zuul-jobs/blob/master/zuul.d/jobs.yaml#L4, add a run: playbook that only use the tox role | 09:28 |
tristanC | spredzy: similar to https://review.openstack.org/#/c/593150/1/playbooks/tox-with-sudo/run.yaml | 09:28 |
tristanC | spredzy: otherwise, the default tox jobs do: https://git.zuul-ci.org/cgit/zuul-jobs/tree/playbooks/tox/run.yaml | 09:29 |
tristanC | which calls revoke-sudo by default | 09:29 |
spredzy | Got it | 09:29 |
tristanC | spredzy: you don't want to put back revoke-sudo, you should just remove zuul from sudoers, no need to let the job do that | 09:29 |
spredzy | Do you have access to the node? | 09:29 |
tristanC | spredzy: nop | 09:30 |
tristanC | spredzy: https://github.com/ansible/zuul-config/pull/34 should work | 09:31 |
tristanC | i added ignore_errors to avoid failures when zuul is no longer sudoer | 09:32 |
tristanC | when zuul is no longer sudoer, then we should manually clean /home/zuul/src from all nodes and revert that pull/34 change | 09:32 |
*** nilashishc has quit IRC | 09:34 | |
spredzy | tristanC: my concern is that since its a static node, sudo has already been revoked as of now | 09:35 |
spredzy | So is zuul still in the sudoers for the next run ? | 09:35 |
tristanC | let's look at the logs, i think revoke-sudo only check a special file | 09:35 |
tristanC | spredzy: https://ansible.softwarefactory-project.io/logs/66/2266/d68989bfd8f34a572f383d42daaf22aaf037d2aa/check/tox-awx-api/fc1b82f/ara-report/result/be45dc48-dbc8-4f17-b938-e85dc6da66a5/ | 09:36 |
spredzy | https://github.com/openstack-infra/zuul-jobs/blob/master/roles/revoke-sudo/tasks/main.yaml#L6-L11 | 09:36 |
spredzy | so currently zuul is not in the sudoers anymore | 09:37 |
tristanC | spredzy: heh :) then who has access to the node and add zuul back? | 09:38 |
tristanC | spredzy: in the meantime, you should fix that awx-tox job to not do the revoke-sudo :) | 09:38 |
spredzy | https://github.com/ansible/zuul-jobs/pull/22/files | 09:39 |
spredzy | done | 09:39 |
spredzy | I have a PR from awx depending on that one' | 09:39 |
spredzy | Let me see if I can get to the node | 09:39 |
spredzy | I am on the nodes | 09:42 |
spredzy | Updating sudoers.d | 09:42 |
tristanC | spredzy: ok, then don't forget about https://github.com/ansible/zuul-config/pull/34 , i'm going afk, bbl | 09:43 |
spredzy | merged | 09:44 |
spredzy | ack, catch you later | 09:44 |
spredzy | thanks for your help | 09:44 |
*** zoli is now known as zoli|lunch | 10:26 | |
*** zoli|lunch is now known as zoli | 10:26 | |
*** nilashishc has joined #softwarefactory | 11:27 | |
*** jangutter has quit IRC | 11:29 | |
*** jangutter has joined #softwarefactory | 11:30 | |
*** chandankumar has joined #softwarefactory | 12:19 | |
pabelanger | tristanC: thanks, just seen a job run on new executor. Do you have an ETA when we can start to scale out nodepool-launcher / nodepool-builders? Also, we likely want to start thinking of dedicated zookeeper also | 12:33 |
pabelanger | tristanC: http://paste.openstack.org/show/731761/ | 12:39 |
pabelanger | seems console.html is no longer closing properly | 12:39 |
pabelanger | now when left over, and job finished, keeps saying --- END OF STREAM --- | 12:39 |
pabelanger | eg | 12:39 |
pabelanger | https://ansible-network.softwarefactory-project.io/zuul/stream/1dd29174f2044ec8834775e586886853?logfile=console.log | 12:39 |
sfbender | Merged software-factory/sf-config master: config-update: skip host only running mergers or executors https://softwarefactory-project.io/r/13272 | 13:15 |
tristanC | pabelanger: would you mind opening a story with suggestion/improvement? | 13:16 |
tristanC | pabelanger: no eta for more scale out, i don't know if this has even been planed | 13:16 |
pabelanger | okay, we've going to need a scale out soon. nodepool-launcher doesn't handle multiple providers well | 13:18 |
pabelanger | I suspect it is already at 100% cpu | 13:18 |
tristanC | pabelanger: zs, which runs zookeeper, nodepool-launcher and zuul-scheduler is currently at 91% idle: https://softwarefactory-project.io/grafana/?panelId=28239&fullscreen&orgId=1&var-datasource=default&var-server=zs.softwarefactory-project.io&var-inter=$__auto_interval_inter&from=now-24h&to=now | 13:20 |
pabelanger | tristanC: how many cores is the server? | 13:22 |
pabelanger | iirc nodepool-launcher isn't multi core | 13:23 |
tristanC | pabelanger: i think the issue is that nodepool tries every providers for each request, even the one that doesn't have the requested label registered | 13:23 |
pabelanger | yah, that sounds familiar | 13:23 |
tristanC | pabelanger: isn't nodepool multi-threaded? | 13:23 |
pabelanger | all i know, the more providers we add to nodepool-launcher, the slower it will eventually get | 13:23 |
tristanC | pabelanger: which i think can be fixed in nodepool by being more clever to assigning provider to request | 13:24 |
tristanC | pabelanger: zs has 4 vcpu | 13:25 |
pabelanger | Yup, I think it should also be fixed. but for now, scaling out nodepool-launchers per tenant, could be another option | 13:25 |
pabelanger | but understand that means updates for sf | 13:25 |
tristanC | pabelanger: yes, and a bad user experience as each launcher needs a custom configuration... | 13:25 |
pabelanger | yah, upstream we have per launch configs | 13:26 |
tristanC | pabelanger: another neat improvement would be to have per-tenant launcher, then it would make sense to run a nodepool-launcher on ansible-network.sf-project.io | 13:28 |
pabelanger | tristanC: right, I suggested that to nhicher last week. If we could run our own launcher, that would be great. But means it would need access to zookeeper | 13:29 |
pabelanger | something I'd like to discuss in Berlin | 13:30 |
pabelanger | along with regional zuul-executors, I think rcarrillocruz is going to pick that up again | 13:30 |
*** nilashishc has quit IRC | 14:37 | |
*** zoli is now known as zoli|gone | 16:04 | |
*** zoli|gone is now known as zoli | 16:04 | |
pabelanger | on the list of things to automate | 17:11 |
pabelanger | wrong windo | 17:12 |
*** sfbender has quit IRC | 19:56 | |
spredzy | is it me or RDO-cloud seems down? | 19:58 |
nhicher | spredzy, pabelanger: rdocloud outage, sf and review.rdo are not reachable | 20:04 |
pabelanger | nhicher: thanks | 20:04 |
pabelanger | really looking forward to day control plane moves out of rdocloud | 20:04 |
spredzy | merci nico | 20:06 |
spredzy | Hopefull with Ansible (and ideally more team) relying on it that will help getting fund for that to happen | 20:07 |
nhicher | spredzy, pabelanger rdocloud is back | 20:57 |
matburt | looks like our jobs are hung on the ansible tenant :/ | 22:51 |
*** sshnaidm is now known as sshnaidm|afk | 23:32 | |
dmsimard | I'd look but I don't have access to that one :/ | 23:50 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!