*** jamesmcarthur has joined #zuul | 00:01 | |
*** jamesmcarthur has quit IRC | 00:04 | |
*** jamesmcarthur has joined #zuul | 00:04 | |
*** zxiiro has quit IRC | 00:06 | |
*** y2kenny has joined #zuul | 00:26 | |
*** y2kenny has joined #zuul | 00:27 | |
*** jamesmcarthur has quit IRC | 00:29 | |
*** jamesmcarthur has joined #zuul | 00:32 | |
*** tosky has quit IRC | 00:32 | |
*** Goneri has quit IRC | 00:53 | |
*** jamesmcarthur has quit IRC | 01:02 | |
*** jamesmcarthur has joined #zuul | 01:06 | |
*** jamesmcarthur has quit IRC | 02:01 | |
*** ianw has quit IRC | 02:30 | |
*** ianw has joined #zuul | 02:36 | |
*** jamesmcarthur has joined #zuul | 02:39 | |
*** ianw has joined #zuul | 02:40 | |
*** jamesmcarthur has quit IRC | 02:46 | |
*** swest has quit IRC | 02:56 | |
*** jamesmcarthur has joined #zuul | 03:02 | |
*** jamesmcarthur has quit IRC | 03:06 | |
*** jamesmcarthur has joined #zuul | 03:08 | |
*** swest has joined #zuul | 03:12 | |
*** jamesmcarthur has quit IRC | 03:14 | |
*** saneax has joined #zuul | 03:19 | |
*** bhavikdbavishi has joined #zuul | 03:28 | |
*** bhavikdbavishi1 has joined #zuul | 03:31 | |
*** bhavikdbavishi has quit IRC | 03:32 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:32 | |
*** jamesmcarthur has joined #zuul | 03:38 | |
*** jamesmcarthur has quit IRC | 03:47 | |
*** dmsimard|off has joined #zuul | 04:18 | |
*** bhavikdbavishi has quit IRC | 04:27 | |
*** jamesmcarthur has joined #zuul | 04:41 | |
*** bhavikdbavishi has joined #zuul | 04:43 | |
*** jamesmcarthur has quit IRC | 04:54 | |
*** sgw has quit IRC | 04:59 | |
*** y2kenny has quit IRC | 04:59 | |
*** jamesmcarthur has joined #zuul | 05:05 | |
*** jamesmcarthur has quit IRC | 05:07 | |
*** jamesmcarthur has joined #zuul | 05:11 | |
*** jamesmcarthur has quit IRC | 05:33 | |
*** evrardjp has quit IRC | 05:36 | |
*** evrardjp has joined #zuul | 05:36 | |
*** bhavikdbavishi has quit IRC | 05:56 | |
*** bhavikdbavishi has joined #zuul | 06:07 | |
*** saneax has quit IRC | 06:46 | |
*** saneax has joined #zuul | 07:06 | |
*** bhavikdbavishi has quit IRC | 07:15 | |
*** smyers has quit IRC | 07:42 | |
*** smyers has joined #zuul | 07:43 | |
*** dpawlik has joined #zuul | 07:47 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Don't accept nodes for unknown requests https://review.opendev.org/714852 | 07:56 |
---|---|---|
*** bhavikdbavishi has joined #zuul | 07:59 | |
*** jcapitao has joined #zuul | 08:05 | |
*** tosky has joined #zuul | 08:24 | |
*** bhavikdbavishi has quit IRC | 08:39 | |
*** jpena|off is now known as jpena | 08:54 | |
*** sshnaidm|afk is now known as sshnaidm | 09:15 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Don't accept nodes for unknown requests https://review.opendev.org/714852 | 09:37 |
*** sshnaidm has quit IRC | 10:18 | |
*** bhavikdbavishi has joined #zuul | 10:18 | |
*** sshnaidm has joined #zuul | 10:19 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: test fetch-sphinx-tarball role https://review.opendev.org/714912 | 10:32 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: test fetch-sphinx-tarball role https://review.opendev.org/714912 | 10:39 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: test fetch-sphinx-tarball role https://review.opendev.org/714912 | 10:50 |
zbr | here is a new challenge, how to validate executors? today I found that fetch-sphinx-tarball was failing to run because unzip was missing, on the executor. | 11:00 |
zbr | my attempt to test above is not going to help in this case, so how can we assure that zuul-roles are working with various executors? | 11:01 |
*** ysandeep has joined #zuul | 11:05 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Don't accept nodes for unknown requests https://review.opendev.org/714852 | 11:15 |
*** jcapitao is now known as jcapitao_lunch | 11:18 | |
tobiash | zbr: which task fails? It downloads a tar.bz2 so it's surprising that it fails with a missing unzip: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/fetch-sphinx-tarball/tasks/html.yaml#L27 | 11:21 |
zbr | tobiash: here is what fixed it https://softwarefactory-project.io/r/#/c/17901/1/ansible/roles/sf-zuul/defaults/main.yml | 11:22 |
zbr | now the question is if we can find a way to avoid it | 11:22 |
zbr | ansible modules are full of weak-deps, so it does not surprise me that they can fail when run on different platforms. | 11:23 |
*** tosky is now known as tosky_ | 11:24 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Install unzip on all platforms https://review.opendev.org/714919 | 11:26 |
tobiash | zbr: that's a hard problem, however this should fix this specific issue ^ | 11:27 |
tobiash | it wasn't listed as binary dependency for dpkg based distibutions | 11:27 |
*** yolanda has quit IRC | 11:27 | |
zbr | tobiash: failure was on centos-8, so is not going to fix it. | 11:28 |
tobiash | zbr: then you're probably not using bindep when installing the executor? | 11:28 |
tobiash | because unzip as well as bzip2 are listed in bindep.txt for rpm based distros | 11:29 |
zbr | tobiash: i have no idea about that, i am not involved with it. i only discovered the issue today when a college reported tox-docs failing on rdo inside the featch-... role. | 11:29 |
tobiash | I guess tristanC should know ^ | 11:30 |
zbr | yeah, for sure. | 11:30 |
zbr | is not urgent but it would be a good idea to not forget about the subject, i am more worried that other similar things may slip | 11:30 |
zbr | if we have a way to validate executors we could prevent it in the future | 11:31 |
*** yolanda has joined #zuul | 11:32 | |
*** sshnaidm has quit IRC | 11:34 | |
*** sshnaidm has joined #zuul | 11:34 | |
*** tosky_ is now known as tosky | 11:40 | |
tristanC | tobiash: zbr: indeed, we missed unzip as an executor requirements | 11:58 |
*** decimuscorvinus has quit IRC | 12:03 | |
tristanC | zbr: hmm, unzip and bzip2 are installed on our executor, what was the issue? | 12:05 |
tristanC | actually that's a bit odd for zuul executor to require custom zuul-jobs requirements like tox-docs, at least from the bindep scope. | 12:08 |
*** sshnaidm is now known as sshnaidm|afk | 12:08 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Don't accept nodes for unknown requests https://review.opendev.org/714852 | 12:08 |
*** sshnaidm|afk is now known as sshnaidm | 12:24 | |
*** rlandy has joined #zuul | 12:24 | |
*** jcapitao_lunch is now known as jcapitao | 12:29 | |
*** jpena is now known as jpena|lunch | 12:29 | |
*** rfolco has joined #zuul | 12:30 | |
avass | this is strange... added a new tenant yesterday and trying to make sure everything works before I let the people start using it. but the executor is checking out master instead of the change I'm testing with so it errors because it can't find the playbook | 12:34 |
avass | not sure why it doesn that | 12:34 |
avass | the exact same change works in another tenant with the same 'check' pipeline configuration | 12:36 |
avass | Oh nevermind I think I found the problem | 12:39 |
*** bhavikdbavishi has quit IRC | 12:54 | |
*** Goneri has joined #zuul | 12:55 | |
avass | anyway, I think I've found a problem with the structure of the executor-git directory | 13:01 |
*** hashar has joined #zuul | 13:01 | |
zbr | tobiash: tributarian: apparently my attempt to test the docs fetching proved useful, have a look at https://review.opendev.org/#/c/714912/ | 13:01 |
avass | it doesn't differentiate between host abc.com port A and host abc.com port B if two different gerrit instances are running on the same server | 13:01 |
avass | so what happens if there are two project with the same name on two differnt gerrit instances running on the same server? | 13:03 |
*** sgw has joined #zuul | 13:09 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Add testing of fetch-sphinx-tarball role https://review.opendev.org/714912 | 13:11 |
openstackgerrit | Merged zuul/nodepool master: nodepool-zuul-functional: switch to editable install https://review.opendev.org/714788 | 13:14 |
*** decimuscorvinus has joined #zuul | 13:18 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Add testing of fetch-sphinx-tarball role https://review.opendev.org/714912 | 13:25 |
*** ysandeep is now known as ysandeep|away | 13:31 | |
*** jpena|lunch is now known as jpena | 13:32 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Add testing of fetch-sphinx-tarball role https://review.opendev.org/714912 | 13:40 |
*** hashar has quit IRC | 13:40 | |
*** hashar has joined #zuul | 13:41 | |
*** smcginnis has left #zuul | 14:00 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Add testing of fetch-sphinx-tarball role https://review.opendev.org/714912 | 14:00 |
*** sgw has quit IRC | 14:02 | |
corvus | avass: do you mean the src_root (work/src/....) where the git repos are checked out inside the job? | 14:09 |
avass | corvus: I mean var/lib/zuul/executor-git/ | 14:18 |
avass | where the git repos are stored for the executors and mergers too i guess | 14:18 |
*** sgw has joined #zuul | 14:19 | |
corvus | avass: hrm, i thought that was stored by connection name | 14:21 |
avass | doesn't look like it unless that has changed recently | 14:22 |
corvus | avass: looks like you're right, and that's by canonical name. | 14:22 |
corvus | which would be the same as the src_root | 14:22 |
avass | yeah | 14:22 |
corvus | avass: so i guess your users clone repos from "https://review.example.com:1234/projectname" and "https://review.example.com:5678/projectname" ? | 14:23 |
avass | corvus: yep something like that, I'm not sure how gerrit is set up but I have a feeling something is not as it should be | 14:24 |
mordred | yeah - that's ... | 14:25 |
avass | :) | 14:25 |
avass | I'm not even sure why we have so many different gerrit instances | 14:26 |
corvus | avass: that all uses the "canonical" hostname -- i've never tried setting a port in there, but there is a possibility it might work (or perhaps it might only require a few changes) | 14:33 |
corvus | avass: https://zuul-ci.org/docs/zuul/reference/drivers/gerrit.html#attr-%3Cgerrit%20connection%3E.canonical_hostname | 14:34 |
corvus | avass: or, if it doesn't matter to you, you might be able to use an entirely synthetic hostname (like 1234-review.example.com) | 14:35 |
avass | corvus: I'll check that out later, it's not a problem at the moment. | 14:35 |
mordred | it sounds like review.example.com:1234 and review.example.com:5678 are the actual canonical hostnames in this case | 14:35 |
corvus | mordred: yes; avass: so i'd start by trying that ^ (i just am not completely sure there's no port parsing involved, but i don't think there is) | 14:36 |
avass | yeah that looks like it could work. since the url to gerrits web dashboard is different from what's used when cloning repo for some reason | 14:39 |
*** zxiiro has joined #zuul | 15:06 | |
zbr | corvus: mordred: what do you think about https://review.opendev.org/#/c/714912/ ? | 15:06 |
corvus | zbr: left a comment | 15:13 |
tristanC | zbr: that change results in ensure-tox to requires sudo, which break our zuul-jobs usage (Software Factory CI is failing) | 15:13 |
corvus | (that was more or less my comment :) | 15:13 |
tobiash | corvus: woot, we soon can replace the giant codeowners patch by this: https://developer.github.com/v4/enum/pullrequestreviewdecision/ | 15:14 |
tobiash | we just have to wait until it landed in github enterprise | 15:15 |
zbr | do we recognize that the way tox-docs jobs is defined it breaks on systems without bzip2 and we do not catch this issue. enabling all-platform jobs uncovered 5 broken platforms. | 15:15 |
zbr | i am more interested in finding a way to detect and prevent it, hacking your node images is always an option. | 15:16 |
zbr | any ideas? | 15:16 |
zbr | with the lack of sudo this is quite an interesting challenge. | 15:17 |
mnaser | zbr: maybe we need to do more work of adding more jobs similar to how we did with ensure-tox, ensure-python? | 15:18 |
corvus | zbr: yes, i think we should solve the problem. how about using gzip? is that more universally pre-installed? | 15:19 |
corvus | zbr: if not, then at least the installation should happen in the tox-docs job, not for all tox jobs | 15:20 |
tristanC | zbr: bindep works great without sudo, we just have to install the package in advance, and then bindep skip install and works without root privilege | 15:21 |
zbr | bindep is in repo, we cannot ask each user that wants to use tox-docs to add bzip2 to his bindep.txt (if he even has one) | 15:21 |
zbr | for openstack repo is not a big deal, but i am concerned about scalability, i would like to make it easy for others to use it. | 15:23 |
zbr | if we add this to tox-docs, how do we test the fetch-sphinx-tarball role? do we assume the testing of this role requires a run of tox-docs? ok for me. | 15:24 |
corvus | zbr: that is how you wrote the test job | 15:24 |
zbr | btw, this bzip2 (or unzip) requirement is more generic than you think, is a requirement for using unarchive module in ansible, quite generic I would sau. | 15:25 |
zbr | my personal preference is to make each role as contained as possible, so each one would install its own requirements, when they are needed, but this does not play well with zuul roles. | 15:26 |
zbr | we should use role dependencies with ansible because ansible is quite smart about them, knowing not to run them twice. | 15:26 |
corvus | zbr: we want installation to happen in pre-playbooks | 15:27 |
tristanC | corvus: gzip is more common than bzip2. We had an issue because of a missing bzip2 after a re-installation of our executor, and it seems like tox-docs requires bzip2 on the executor. We fixed that by adding 'Requires: bzip2' to the zuul executor package spec | 15:28 |
zbr | corvus: ok-ish as it breaks reusability of the roles, once you assume that a role would work only when a specific playbook was run, you endup having a role that is not really self-contained. | 15:28 |
mordred | it seems like in the vm-only world having jobs be completely self-contained is easy ... but we run afoul of running some of these in rootless containers - largely because we don't then have a good way to express "this job needs to run in a container that has $stuff" | 15:30 |
*** arxcruz|rover is now known as arxcruz | 15:30 | |
zbr | maybe we need to make some king of validation-test-suite for nodes (vms or containers), where we can try running all our tests with them and state what is broken or not. | 15:31 |
corvus | zbr: i don't think that will ever be possible | 15:31 |
corvus | some roles are just going to have to require things be installed out-of-band and should document that. for example, a role that does something with afs. | 15:31 |
corvus | the more common roles, however, i agree we should try to make work more easily in most circumstances | 15:32 |
zbr | i am more concerned about "common" roles too | 15:32 |
corvus | so i think switching to gzip may help that. but i still think that we should not add a sudo requirement to the ensure-tox role since not all tox usage requires it. if we want to add something like that, then we should add a new "ensure" role and run it in the pre-playbook for tox-docs. | 15:33 |
corvus | presumably that will not break anyone since if they are running tox-docs they should have bzip2/gzip | 15:33 |
mordred | corvus: yeah- like "ensure-sphinx-deps" | 15:34 |
mordred | or something | 15:34 |
zbr | it is going to be fun, as I see a huge number of "ensure-*" stuff appearing soon | 15:34 |
mordred | and then - even for the container case - if it has something like "package: bzip2 state: present" - if it's there it's a no-op and if it can't install it because no sudo -it can at least give an error about that so that it's clear the job is running on an unsuitable node | 15:35 |
corvus | yep | 15:36 |
tristanC | doesn't `become: true` evaluated before the task action? | 15:36 |
zbr | i wonder if it would not be easier to have a meta-ensure, that receives a lit of stuff the node needs to have. | 15:36 |
mordred | corvus: makes me want the ability for a pre-playbook to assert a fatal condition to avoid re-runs ... so a pre-playbook ensure role could say "I don't have bzip2 or sudo - I can't continue" | 15:36 |
*** jamesmcarthur has joined #zuul | 15:37 | |
mordred | tristanC: it might - so we might have to do those tasks a little more verbosely | 15:37 |
corvus | mordred, tristanC: maybe we should make a package install role to address both problems? | 15:37 |
mordred | yeah | 15:37 |
mordred | "if I have the package, do nothing, if I don't have the package and I also don't have sudo, assert a fatal error, otherwise install the package" | 15:37 |
corvus | zbr: ^ is that like your meta-ensure idea? | 15:38 |
zbr | i would not use the term package, more of "tool", as the way of installing it may be different based on the tool. usually it would a system package. | 15:38 |
mordred | I agree - I think the "this is how you install it" can be abstract - but the main chunks / algorithm should be about the same | 15:39 |
mordred | "how do I find out if I have it" "how do I install it" "can I install it" | 15:39 |
zbr | can zuul merge dictionaries on inheritance? i wonder if we can declare tool requirements, so users would be able to declare what is needed by their jobs. | 15:39 |
tristanC | well, the restricted node use-case, e.g. where sudo doesn't work, is a poor user experience, and we are actually replacing them by kubernetes pods | 15:43 |
mordred | tristanC: good! that does seem like a better use case | 15:43 |
zbr | it would be cool if user could declare a list of tags/labels that define requirements for running it and zuul would auto-pick a node that matches it, or install missing ones to an exiting node. | 15:44 |
zbr | but that is far-far away | 15:44 |
*** bhavikdbavishi has joined #zuul | 15:45 | |
tristanC | mordred: though, in kubernetes, sudo doesn't work either because we don't allow new privilege, so the jobs are running as root, which needs another hack to mitigate the `revoke-sudo` role... this is documented in this container file: https://softwarefactory-project.io/cgit/config/tree/containers/centos-7/Dockerfile#n26 | 15:45 |
tristanC | well it's not much better, but at least vanilla zuul-jobs now works in container | 15:49 |
tristanC | now, that's a bit inefficient to use any ensure-* or bindep role with container, i'd rather let project provides container image with their bindep, and make the job run the command directly | 15:50 |
zbr | tristanC: i have to confess that i loved these few jobs i seen using containers, starting instantly. | 15:51 |
clarkb | rather than hack revoke-sudo why not update the role to check if it can sudo and if it can't its a noop beacuse there is nothing to revoke? | 15:51 |
clarkb | that seems like a really simple fix | 15:52 |
tristanC | clarkb: but then you can't use task that have `become: true` | 15:52 |
clarkb | thats true regardless, you don't have sudo | 15:53 |
mordred | right - but jobs expect become: true to fail after they've run revoke-sudo | 15:53 |
mordred | seems like updating revoke-sudo to be able to be used like an assert if sudo is already not there would be a nice improvement | 15:54 |
tristanC | clarkb: mordred: probably, i just ran out of time to fix that in zuul-jobs :) | 15:54 |
mnaser | maybe some sort of classification of roles is something we should start looking into | 15:58 |
mnaser | a basic on would be like: "requires-root-access" and then zuul autodoc stuff can traverse and mark jobs that run/use that role in the docs that they require root access | 15:58 |
mnaser | because it's inevitable that we'll have things that will require root access no matter what | 15:59 |
*** bhavikdbavishi1 has joined #zuul | 15:59 | |
tristanC | my point is, when running in container, the bindep, ensure-* and sudo managements are not necessary and these tasks can really slow down fast job | 16:00 |
*** bhavikdbavishi has quit IRC | 16:00 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 16:00 | |
clarkb | tristanC: making them noop should be pretty fast too... | 16:01 |
tristanC | for example, most of our tox-linters currently takes 150 seconds to compelte, and they could probably run under a minutes without them | 16:01 |
tristanC | here is an example: https://softwarefactory-project.io/logs/10/17910/1/check/tox-pep8/1eb5811/ara-report/ `revoke-sudo : Check if zuul is sudoer` takes 11 seconds, `tox : .*` takes 58 seconds | 16:05 |
clarkb | tristanC: why does `sudo -n true` take 11 seconds? | 16:07 |
clarkb | is that an asnible connection startup cost? this is the first task in a new play I Think | 16:07 |
tristanC | clarkb: that's a good question :) | 16:07 |
clarkb | because sudo -n true does not take that long here | 16:08 |
clarkb | real0m0.011s | 16:08 |
tristanC | that doesn't seems like a startup cost, the previous play performs the `ensure-tox : Ensure tox is installed` under a second | 16:11 |
clarkb | tristanC: each playbook is new startup though | 16:12 |
clarkb | (that is why I wondered if this is startup cost) | 16:12 |
tristanC | well, i meant, user providing ready to use container, which is quite common, could avoid all the cost associated with running ensure-* tasks/play | 16:14 |
*** jcapitao is now known as jcapitao_afk | 16:14 | |
fungi | but if the whole playbook doesn't go away, that startup cost (if that's what the bulk of the time is representing) doesn't disappear | 16:15 |
clarkb | right thats why I'm curious to debug this further | 16:15 |
clarkb | because that 11 seconds may not go away after removing the checks | 16:15 |
fungi | also 11 seconds seems like a very long time to authenticate over ssh and start a python process | 16:17 |
fungi | there's got to be other problems at work there | 16:17 |
clarkb | fungi: it also has to parse all the yaml, generate internal task lists for each host, and start processing them | 16:17 |
clarkb | still 11 seconds is a long tiem for that, but identify where things cost more than we expect and properly debugging them is likely to be beneficial | 16:18 |
fungi | "all the yaml" for the playbook? | 16:18 |
clarkb | fungi: ya | 16:18 |
fungi | (and included roles i guess) | 16:18 |
clarkb | fungi: also that includes jinja templating | 16:18 |
clarkb | and we know jinja templating is slow | 16:18 |
clarkb | as we discovered with our experiments in dynamic inventory | 16:18 |
fungi | i thought its slowness was linear with the size of the document though | 16:18 |
clarkb | fungi: the yaml is but not the jinja necessarily | 16:19 |
clarkb | since a one liner jinja line could be expensive | 16:19 |
fungi | well, i mean, jinja templating on the whole is not expensive if the number of substitutions is small and the operations requested are trivial | 16:19 |
fungi | i've seen jinja templating in, e.g., sphinx extensions work more or less instantaneously | 16:20 |
tristanC | let's see, i've submitted an identical change without the revoke-sudo role | 16:22 |
tristanC | and the run play first task completed almost instantly, so it's probably related to using the revoke-sudo role without sudo: https://softwarefactory-project.io/logs/18/17918/1/check/tox-pep8/50cc3a4/ara-report/ | 16:25 |
clarkb | if I run just that task in a playbook against localhost it takes ~3 seconds wall time. If I disable fact gathering it takes ~1 second | 16:26 |
clarkb | I expect that fact gathering might be the major cost here and that is setup cost | 16:26 |
clarkb | and in that case simply removing the role won't dramatically speed things up | 16:27 |
zbr | fungi: clarkb: the time is wasted doing the implicit gather_facts, unless user disables them. not sure how fact caching is configured on zuul. | 16:27 |
tristanC | back to initial problem, it seems like there is already strong assumption about how a role can be used (e.g. ensure-tox > tox > fetch-tox), and that's fine because that enable custom usage, e.g. use `tox` directly. | 16:27 |
clarkb | zbr: yup I think that is what I've just discovered lcoally, thanks for the confirmation | 16:27 |
zbr | clarkb: that is one of the reasons I usually start any new playbook by disabing gather, and using gather only on facts that I need on each role. | 16:28 |
zbr | with conditions: if xxx is not defined, gather xxx, very fast. | 16:28 |
clarkb | tristanC: Run tox without tests may be capturing that time cost for you in the new run | 16:29 |
zbr | usually facts are cached between runs, so it will take a long time only first time. | 16:29 |
clarkb | since fail, stat, and set_fact may not imply fact gathering | 16:29 |
zbr | if i remember well networking information is prone to take most time | 16:29 |
tristanC | clarkb: tox without tests went from 51 to 49 | 16:29 |
clarkb | tristanC: http://paste.openstack.org/show/791151/ that is what I'm observing | 16:31 |
clarkb | and that isn't doing anything special | 16:31 |
clarkb | if I let it gather facts the runtime is sigifnicantly longer (and zbr supports that theory too) | 16:31 |
clarkb | if I run the task twice the runtime of the play increases about 150ms | 16:32 |
clarkb | (implying the task itself is never the major cost) | 16:32 |
tristanC | clarkb: i observe the same thing locally. but within zuul executor context, on that special node which doesn't have sudo, the revoke-sudo adds 11 seconds to a 58 second play: https://softwarefactory-project.io/logs/10/17910/1/check/tox-pep8/1eb5811/ara-report/ vs https://softwarefactory-project.io/logs/18/17918/1/check/tox-pep8/50cc3a4/ara-report/ | 16:33 |
tristanC | perhaps it's dns, or how jinja gets evaluated, or something else. but in anycase, shouldn't an user who doesn't have sudo, or simply doesn't care, could skip the revoke-sudo role entirely? | 16:35 |
tristanC | if it's not for runtime cost, at least for report visibility where we don't need to see those task being ok or skipped | 16:35 |
clarkb | tristanC: maybe, but thats orthogonal to your goal of "make things fast" if the cost is independent of sudo | 16:35 |
clarkb | which is what I'm trying to get you to consider | 16:36 |
clarkb | the goal I heard was make things faster | 16:36 |
clarkb | not "stop checking sudo" | 16:36 |
clarkb | experimentation is showing that startup costs are a big cost, but simple tasks to check presense of tools/permissions are not | 16:37 |
tristanC | oh well, i mentioned that need for speed as we were talking about adding an ensure-tool role to setup bzip2 | 16:37 |
clarkb | that tells me we need to reconsider how to make ansible quicker in general and not optimize specific roles | 16:37 |
clarkb | as for optimizing specific roles I think the jobs in zuul-jobs should accomdate a range of use cases. I think that the best way to do that is to not rip functionality out, but to noop when appropriate | 16:39 |
tristanC | clarkb: yeah i understood, and appreciate the consideration for unnecessary cost :) though in that case, it doesn't seems to be caused by startup or fact | 16:39 |
clarkb | tristanC: is it possible that sudo is slow on your platform due to pam nsswitch or whatever other things sudo has to lookup? | 16:40 |
tristanC | even more weird, the `Check if zuul is sudoer` is actually failing with "[Errno 2] No such file or directory" as sudo is not installed | 16:40 |
clarkb | (you mentioned dns which sudo does resolve too) | 16:40 |
clarkb | hrm that should fail very fast :) | 16:41 |
clarkb | but let me update my local test to see if ansible is being weird when command does not exist | 16:41 |
clarkb | nope still about 1 second | 16:42 |
tristanC | using a `sudo2` typo didn't made a difference locally either | 16:42 |
clarkb | zbr: do you know if there are any ansible instrumenters? | 16:48 |
clarkb | if so that could provide itneresting data and maybe help tristanc figure out the difference between zuul and local runs of this task | 16:48 |
zbr | clarkb: no idea, but you can safely skip playing with mitogen (to save you some time) | 16:49 |
zbr | the facts gathering is a documented major source of delays, no1 in fact but ameliorated by two things: facts caching and ssh connection caching. | 16:50 |
clarkb | I believe zuul does fact caching | 16:50 |
*** jcapitao_afk is now known as jcapitao | 16:50 | |
clarkb | and it uses the ssh control persistent master processes to keep tcp connection open for ssh | 16:50 |
zbr | we could speed-up considerably by disabling default gather and replace it with a custom one that reads the. minimal: python version and os version. | 16:52 |
zbr | gathering costs a lot because it does a lot of things | 16:52 |
mnaser | i think if we do that we make the user experience a lot harder | 16:55 |
mnaser | and gathering facts only happens once and probably takes up like a few seconds in the grand scheme of things in the whole job | 16:56 |
*** rfolco is now known as rfolco|bbl | 16:58 | |
clarkb | mnaser: for jobs that extensively use ansible its definitely worthwhile | 16:58 |
clarkb | but as tristanC mentions there is a subset of job that has minimal requirements and should run quickly that in some cases aren't | 16:59 |
clarkb | we probably can improve the runtime of those jobs with changes to how we use ansible (though I don't think we've fully identified the costs yet) | 16:59 |
tristanC | clarkb: i haven't noticed a big cost to use the default fact gathering. what matter most imo is the amount of tasks and plays, for performance and also usability of the report | 17:00 |
corvus | fact gathering should happen right at the start of the build, before the first playbook. | 17:00 |
tristanC | what i meant is that adding `ensure-tool` role shouldn't bother with missing sudo, as container environment (who likely do not have sudo) shouldn't have to run ensure tasks in the first place | 17:02 |
corvus | (during zuul's built-in setup phase) | 17:02 |
corvus | tristanC: i think mordred suggested that we may need to determine whether we can run sudo because of the problem you pointed out with "become:true" being evaluated early | 17:02 |
tristanC | mea culpa, i guess we are the main user of zuul-jobs without sudo, and it is actually causing bad user experience when jobs suddenly uses "become:true", thus we are actively moving away from such sudoless nodes | 17:04 |
*** jamesmcarthur has quit IRC | 17:12 | |
*** jamesmcarthur has joined #zuul | 17:13 | |
*** jamesmcarthur has quit IRC | 17:24 | |
*** bhavikdbavishi has quit IRC | 17:24 | |
*** jamesmcarthur has joined #zuul | 17:24 | |
openstackgerrit | Merged zuul/nodepool master: Enable setting label and instance name separately https://review.opendev.org/712666 | 17:34 |
*** evrardjp has quit IRC | 17:36 | |
*** evrardjp has joined #zuul | 17:36 | |
*** leathekd has joined #zuul | 17:48 | |
leathekd | Is it possible to have a tenant or jobs that are only visible to authenticated users? I feel like it's possible depending on how web servers are deployed but don't want to make assumptions. | 17:57 |
corvus | leathekd: yes, if you use the 'tenant whitelabel' setup for the web ui, that should be possible. | 17:59 |
*** jpena is now known as jpena|off | 18:00 | |
*** jamesmcarthur has quit IRC | 18:01 | |
*** jamesmcarthur has joined #zuul | 18:03 | |
leathekd | corvus: excellent, thanks | 18:06 |
fungi | though that assumes your git repositories containing the job definitions and configuration are also only visible to authenticated users | 18:06 |
*** jamesmcarthur has quit IRC | 18:08 | |
leathekd | fungi: that's probably true in this case, but, out of curiosity, why is that a requirement? | 18:12 |
fungi | leathekd: because your job definitions won't be secret if they're hosted in a public service. zuul can't solve that part for you | 18:12 |
fungi | it's not technically a requirement from zuul's perspective, just a logistical requirement of the use case | 18:13 |
leathekd | Gotcha, thanks. | 18:14 |
fungi | zuul needs you to host git repositories somewhere it can reach, that's zuul's requirement. providing a hosting location which meets your privacy needs is up to you though | 18:14 |
*** jamesmcarthur has joined #zuul | 18:15 | |
*** jamesmcarthur has quit IRC | 18:22 | |
*** jamesmcarthur has joined #zuul | 18:23 | |
*** jamesmcarthur has quit IRC | 18:25 | |
*** jamesmcarthur has joined #zuul | 18:25 | |
*** irclogbot_2 has quit IRC | 18:37 | |
*** rfolco|bbl is now known as rfolco | 18:39 | |
*** jcapitao has quit IRC | 18:44 | |
mnaser | let me submit a patch | 18:44 |
mnaser | corvus, zbr: did we reach some sort of quorum on deciding what to do with fetch-sphinx-tarball ? | 18:46 |
mnaser | it's currently blocking some stuff here so i'd like to drive a way forward | 18:46 |
mnaser | gzip seems reasonable | 18:46 |
mnaser | my local testing shows that gzip is a good idea https://www.irccloud.com/pastebin/92p90sFD/ | 18:49 |
clarkb | the issue is that we use bzip2 in ensure-python? | 18:55 |
mnaser | clarkb: nope the issue is that in fetch-sphinx-output, we rely on bzip2 | 18:55 |
mnaser | which means that it needs to exist in the executor node as well | 18:55 |
clarkb | mnaser: can you link to that I'm completely failing at grepping and grep says bzip2 is in ensure-python and nowhere else | 18:56 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: Add testing of fetch-sphinx-tarball role https://review.opendev.org/715028 | 18:56 |
mnaser | clarkb: i proposed an alternative built off of zbr work which shows off the issue (and a fix) | 18:56 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: Add testing of fetch-sphinx-tarball role https://review.opendev.org/715028 | 18:57 |
clarkb | oh is it purely the suffix? | 18:58 |
*** jamesmcarthur has quit IRC | 18:58 | |
clarkb | that chagne isn't changing the tool used | 18:58 |
clarkb | unless I guess tar is inferring it | 18:58 |
mnaser | i think tar does infer it (and so does the 'unarchive' module too) | 18:59 |
clarkb | ya its not its explicitly using -j | 18:59 |
clarkb | you need to update the tar command to use -z | 18:59 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: Add testing of fetch-sphinx-tarball role https://review.opendev.org/715028 | 19:00 |
mnaser | clarkb: ^ that? | 19:00 |
clarkb | mnaser: no you have to remove the j | 19:00 |
clarkb | -j means use bzip | 19:00 |
clarkb | -z means use gzip | 19:00 |
mnaser | oh. | 19:00 |
mnaser | that totally makes sense | 19:00 |
mnaser | i wonder why j => bzip | 19:00 |
clarkb | (and now I understand why my grepping failed, its beacuse tar is invoking it based on the flag) | 19:00 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: Add testing of fetch-sphinx-tarball role https://review.opendev.org/715028 | 19:01 |
clarkb | yes that should do what you want now | 19:01 |
mnaser | clarkb: do you want to prepare opendev for https://review.opendev.org/#/c/712804/ or it should be pretty much noop given it's using system tox? | 19:02 |
*** irclogbot_3 has joined #zuul | 19:03 | |
clarkb | mnaser: it should be a noop for us | 19:03 |
*** jamesmcarthur has joined #zuul | 19:06 | |
clarkb | mnaser: on the -x side to inflate you had to provide the matching -z -j -J etc to match the file type in the past. But now that is inferred base on the file itself. I thought that -c might do that same, but in this case we were explicit either way | 19:08 |
clarkb | -c may not do the inference because it is perfectly valid to tar and not compress | 19:08 |
mnaser | i see | 19:08 |
*** y2kenny has joined #zuul | 19:09 | |
mnaser | clarkb: cool, i jus ttested and for what it's worth it worked fine here -- https://object-storage-ca-ymq-1.vexxhost.net/v1/bfd521072e894ebb99e66f72619daa8a/vexxhost-ci_c48/488037/6/check/tox-docs-oopt-gnpy/c486a4e/ | 19:13 |
AJaeger | mnaser: https://opendev.org/openstack/project-config/src/branch/master/playbooks/api-jobs/promote.yaml#L25 needs updating, doesn't it? | 19:16 |
mnaser | AJaeger: ah yes, that does impact that behaviour | 19:16 |
AJaeger | mnaser: http://codesearch.openstack.org/?q=docs-html.tar.bz2&i=nope&files=&repos= | 19:16 |
mnaser | hmm | 19:17 |
mnaser | i wonder if we would consider this behaviour change. | 19:17 |
mnaser | aka we should do a 14 day timeout on this | 19:17 |
mnaser | (maybe add a setting in the meantime to unblock people) | 19:17 |
corvus | (i think the bzip problem is that it needs to be on the work node; it also needs to be on the executor, but that's an easier problem to solve) | 19:18 |
mnaser | yeah, the executor is the tricky bit. for example, our current zuul images dont have bzip2 in them | 19:18 |
corvus | that's easy to solve, we can just add it:) | 19:19 |
corvus | that won't break anyone | 19:19 |
AJaeger | mnaser: for opendev/openstack it's for sure a behavior change. Could we rewrite the promote job to handle both bzip2 and gzip? | 19:19 |
corvus | mnaser: i think if we just add it to bindep, it'll end up in the image | 19:20 |
mnaser | AJaeger: yeah, we either support both for a while -- or at least we solve the bzip2 issue first inside the executor image | 19:20 |
AJaeger | mnaser: I added a WIP to the change | 19:20 |
mnaser | oh look at that, we already have bzip2 in there | 19:21 |
mnaser | but just for platform:rpm | 19:21 |
mordred | useful | 19:21 |
corvus | so 1) add it to image to handle executor-side. 2) for the worker side, we either: 2a) add a flag and 14 day notice or 2b) make promote handle both, then switch fetch to gzip. that *probably* won't break anyone (though technically it could). | 19:22 |
corvus | (i'm thinking that 2b could be done with minimal notice, but i could be talked out of that and into giving more notice for 2b too if people think that's risky) | 19:22 |
corvus | (i'm basically just assuming any platform that has bzip2 installed probably also has gzip) | 19:22 |
AJaeger | corvus: before we do 2b, I would do a short poll to our users here... | 19:23 |
openstackgerrit | Mohammed Naser proposed zuul/zuul master: bindep: add bzip2 to all platforms https://review.opendev.org/715041 | 19:23 |
mnaser | thats #1 and should be relatively non-impactful ^ | 19:23 |
mnaser | i wonder if we can refactor the promotion stuff to be a little more flexible and not using hard coded file names | 19:27 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Add a role to install and configure dstat to run in the background https://review.opendev.org/518374 | 19:27 |
mnaser | because we're downloading "docs_archive" but then later referencing "docs-html.tar.bz2" | 19:27 |
mnaser | i guess if we can return a dictionary with "key to file" so that you can reference zuul_artifacts['docs_archive'] and we would get rid of that hard-coded bit | 19:28 |
openstackgerrit | Monty Taylor proposed zuul/nodepool master: Pin docker images to 3.7 explicitly https://review.opendev.org/715043 | 19:28 |
AJaeger | mnaser: would be cool if we can do that. | 19:28 |
mnaser | seems possible | 19:29 |
y2kenny | For kubernetes node pool driver, how does the user define the kind of pod to be launch for the labels type namespace? | 19:32 |
y2kenny | for the labels.type: pod I can see the field for defining an image | 19:33 |
y2kenny | since nodepool receive instructions from the scheduler, I assume the scheduler, may be there's a way to define it in the job? but I am not sure. | 19:35 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Add a role to install and configure dstat to run in the background https://review.opendev.org/518374 | 19:35 |
*** hashar is now known as hasharDinner | 19:35 | |
*** irclogbot_3 has quit IRC | 19:37 | |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: download-artifacts: provide a dictionary https://review.opendev.org/715045 | 19:39 |
clarkb | y2kenny: I think the pod type is pretty simple/generic. If you need fine grained control you can request a namespace instead then construct what you nee | 19:39 |
mnaser | AJaeger: ^ i wonder how we can test that... | 19:39 |
clarkb | y2kenny: looking at docs the pod's only configurable attribute is the image to run in it | 19:39 |
y2kenny | clarkb: for the namespace type, then do I construct what I need via nodepool or do it independently of zuul | 19:40 |
y2kenny | or can I specify some ansible roles and nodepool would understand it some how? | 19:40 |
fungi | i think the intent is that the job declares it needs a namespace and then creates whatever is desired within it as part of the job definition/playbooks | 19:41 |
clarkb | y2kenny: nodepool will return a set of connection and namespace details to zuul, then using that you write ansible to create pods and things | 19:41 |
*** Openk10s has quit IRC | 19:41 | |
clarkb | what nodepool provisions in that case is the namespace itself (and permissions on the namespace iirc) | 19:41 |
*** irclogbot_0 has joined #zuul | 19:41 | |
y2kenny | um... interesting... | 19:42 |
y2kenny | I will give both a try | 19:42 |
*** irclogbot_0 has quit IRC | 19:42 | |
clarkb | tristanC probably has some example that can be shared (of both cases) | 19:43 |
tristanC | y2kenny: this is how software-factory setups base job for node both kubernetes node types: https://softwarefactory-project.io/cgit/config/tree/zuul.d/_jobs-openshift.yaml | 19:45 |
tristanC | iirc this was also proposed in zuul-base-jobs there: https://review.opendev.org/#/c/570669/ | 19:46 |
tristanC | well we are relying on openshift BuildConfig to build speculative image | 19:46 |
*** irclogbot_3 has joined #zuul | 19:46 | |
y2kenny | tristanC: thanks, I was just about to ask that. So looks like you pass the image via a variable "base_image" | 19:47 |
tristanC | y2kenny: that's what is used for the BuildConfig input | 19:47 |
AJaeger | mnaser: sorry, too tired now to think this through ... I'll sign off now. Thanks for tackling this! | 19:48 |
mnaser | AJaeger: no worries, have a good night | 19:48 |
AJaeger | thansk | 19:48 |
y2kenny | tristanC: um... openshift seems to have a lot more fields for control than k8s in the nodepool | 19:50 |
corvus | y2kenny: other than image, what do you want to specify? | 19:50 |
y2kenny | How does BuildConfig fit in with job and nodepool? is BC something the nodepool driver will call? | 19:50 |
tristanC | y2kenny: iiuc, there is no way to build an image is vanilla k8s | 19:50 |
y2kenny | oh, I think just a pod image would be a good start. But in the Openshift driver, you can specify CPU and memory limits as well | 19:51 |
tristanC | y2kenny: that's because openshift would terminate the pod without the limit | 19:52 |
y2kenny | so I am wondering how flexible things are. Like, how much of k8s API objects are exposed and configurable | 19:52 |
corvus | i think the k8s driver could be extended to support things like that fairly easily; it would all be in nodepool, so you'd still specify it by label from zuul, and that label would have those extra attributes. | 19:52 |
corvus | sort of like how we specify vm flavors | 19:52 |
y2kenny | corvus: I see. Yea, I was wondering about that because you do have a nodepool-builder. So some of these seems to fall under the function of the builder | 19:53 |
y2kenny | like, perhaps the builder can take some kind of k8s yaml files. | 19:54 |
y2kenny | although it's sightly different than building an artifact | 19:55 |
corvus | y2kenny: well the builder is for building vm images (it could be used for building container images, but honestly, it might be better to do that all in zuul; the tools are pretty good there). but things like cpu limits are pod runtime options, so i think they make more sense as launcher options. | 19:55 |
y2kenny | another question I have is whether nodepool builder will eventually build container images | 19:55 |
y2kenny | corvus: ah ok, you just answered my question. | 19:56 |
*** hasharDinner is now known as hashar | 19:57 | |
corvus | yeah, i think in our original ideas for k8s support in nodepool we were thinking we'd use the builder for that, but it's so easy to build and publish images in zuul jobs, and even stress test them before publishing them, that i think it's probably a better experience | 19:57 |
corvus | we don't really have an example of that though, just all the pieces are there i think. | 19:57 |
y2kenny | corvus: so when you say build and publish container images in a zuul jobs, is that done with k8s node or just a regular node that can launch docker? | 19:59 |
corvus | the second | 20:00 |
corvus | y2kenny: i've never tried to do the first, but i do believe tristanC when he says it's difficult/impossible | 20:00 |
*** irclogbot_3 has quit IRC | 20:00 | |
y2kenny | ok.... yea.. I was actually digging into that for the last week or so. | 20:00 |
mordred | and the second is very good at its job | 20:00 |
y2kenny | there are possibilities | 20:00 |
clarkb | you need a special image builder like img to do it in the non privileged case | 20:00 |
clarkb | itsdoable with constraints and special tools | 20:01 |
y2kenny | especially now that there's so much hype around the term "gitops" | 20:01 |
mordred | clarkb: and even for that you need ultra new host kernel and you need to do things with capabilities and whatnot | 20:01 |
y2kenny | there's kaniko | 20:01 |
y2kenny | but I haven't tried that myself | 20:01 |
*** jamesmcarthur has quit IRC | 20:01 | |
clarkb | semi related I've discovered the practice if mounting the docker socket into containers | 20:02 |
tristanC | y2kenny: corvus: the challenge is that you need recursive container | 20:02 |
clarkb | becauseif you didnt have enough privs before just add full control :) | 20:02 |
corvus | tristanC: but maybe img works? | 20:02 |
*** jamesmcarthur has joined #zuul | 20:03 | |
corvus | or buildah? | 20:03 |
y2kenny | this is what I came across before: | 20:03 |
y2kenny | https://github.com/GoogleContainerTools/kaniko | 20:03 |
mordred | yeah - the thing is - taht's structured in a way that makes it hard to use the power of zuul | 20:03 |
tristanC | iiuc, last time i checked, to build a container inside a container still requires some trick, e.g. an overlayfs setting and running the nested container using a special isolation mode | 20:04 |
mordred | it assumes a version of gitops that goes "I'm going to push a commit to git and I want a system to respond to that published git commit" | 20:04 |
mordred | which is neat, I suppose, but misses the power of being able to do all of that before the commit lands | 20:04 |
tristanC | or you do like clarkb suggested, share the docker socket, but then you no longer have isolution | 20:04 |
*** irclogbot_1 has joined #zuul | 20:05 | |
fungi | the gitops movement is all about testing in production and relying on self-healing to plaster over whatever you've broken long enough to undo it | 20:05 |
mordred | we prefer doing gated gitops - where operators are not pushing git commits, but are instead reviewing proposed commits, having images built in gate that are then promoted once gate passes - so that the image that was tested is the image that is deployed | 20:05 |
tristanC | mordred: hence the `create staging-http ImageStream` task from https://review.opendev.org/#/c/570669/8/playbooks/openshift/pre.yaml | 20:05 |
corvus | this is relevant: https://developers.redhat.com/blog/2019/08/14/best-practices-for-running-buildah-in-a-container/ | 20:06 |
mordred | tristanC: yeah. I mean - it's an approach that exists - but it's _massively_ more complicated than running docker build in a vm which works GREAT | 20:06 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 20:06 |
corvus | fungi: well, that's one definition of gitops, one to which i don't subscribe. | 20:06 |
fungi | yes, i didn't mean to imply that doing gitops is the same as subscribing to the mass popularized "gitops movement" | 20:07 |
y2kenny | mordred: yea, that's why I haven't found a good gitops solution. They all seem more complicated than I need them to be. | 20:08 |
*** jamesmcarthur has quit IRC | 20:08 | |
fungi | i don't think the average gitopser is opposed to the idea of testing things before they go into production, they just don't know about the tools which would enable them to do so, and have an ecosystem designing automation which makes it harder to do that | 20:08 |
corvus | tristanC: if i understand https://developers.redhat.com/blog/2019/08/14/best-practices-for-running-buildah-in-a-container/ correctly, using buildah should be possible -- it looks like they made a special 'buildah' container image which has all the extra tools set up that's needed to do that | 20:09 |
corvus | so you'd build your new container image inside the buildah container. i think. | 20:10 |
tristanC | mordred: right, but it seems like the whole point of recursive container is to not requires a vm. | 20:10 |
corvus | personally, i really like building container images on vms. but if i were constrained to container only, it might be worth looking into that? | 20:10 |
tristanC | corvus: last time i tried, it was quite difficult to make it work. if that can works in vanilla k8s, then we should totally document that | 20:11 |
corvus | tristanC: there's a comment at the bottom from someone who identified the missing pieces from that :) | 20:11 |
y2kenny | I am hoping to generalize the base infrastructure as much as possible | 20:11 |
corvus | and, i guess, documented them in the form of publishing their own container image? :/ | 20:12 |
y2kenny | another possibility is do something like kubevirt (VM on top of container)... but now it's turtle all the way down | 20:12 |
*** irclogbot_1 has quit IRC | 20:12 | |
corvus | y2kenny: i will gladly attend your conference talk about that :) | 20:12 |
tristanC | corvus: my last attempt is documented in https://github.com/containers/buildah/issues/1335 , and it seems like the current solution is to share /dev/fuse | 20:12 |
tristanC | s/current/past/ | 20:13 |
corvus | tristanC: i think the blog post is after that; maybe that's what prompted dan to write it | 20:13 |
*** y2kenny has quit IRC | 20:14 | |
corvus | it looks like all the fuse stuff is in-container, so that doesn't seem troubling | 20:14 |
corvus | but i admit, my knowledge comes from skimming that post very quickly | 20:14 |
*** jamesmcarthur has joined #zuul | 20:15 | |
*** y2kenny has joined #zuul | 20:15 | |
tristanC | corvus: iiuc, in vanilla k8s you can already share the host docker socket by default, so that's only useful in a lock down environment | 20:15 |
corvus | well, i think even in a non-locked-down environment, it's very reasonable not to want to share the docker socket | 20:17 |
tristanC | corvus: in that case, you need to setup pod security policy to prevent arbritary hostPath | 20:17 |
corvus | tristanC: if you don't trust your users (which are usually yourself) | 20:18 |
corvus | tristanC: i'm imagining a nodepool running against a k8s with a label configured for the 'buildah' container image; a zuul job could request that label, get a buildah pod, and build an image. | 20:18 |
tristanC | corvus: yes, that would be ideal, and similar to what an openshift buildconfig does. but it's not clear what are the requirements and psp for such 'buildah' image in vanilla k8s | 20:20 |
corvus | tristanC: agreed; dan't post makes me think it should be possible, but it warrants more research or experimentation. | 20:20 |
corvus | s/dan't/dan's/ | 20:21 |
*** irclogbot_1 has joined #zuul | 20:22 | |
corvus | mordred, paladox: i have merged the change to add the checks to all the core plugins | 20:32 |
mordred | corvus: woot! | 20:32 |
corvus | i intentionally merged that first before merging the one that added the jobs | 20:32 |
corvus | this way zuul got events for all the currently outstanding changes in those repos, and declared the checks "not relevant" | 20:33 |
mordred | corvus: ah - good approach | 20:33 |
corvus | (an interesting thing about the checks plugin is when you turn it on for a repo, you get the backlog. that can be good or bad) | 20:33 |
corvus | i figured running a whole bunch of jobs on old changes wasn't ideal right now | 20:33 |
corvus | so now i'll merge the change to add the jobs, and they'll run on new patchsets, or if we click the re-run button | 20:34 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 20:34 |
mordred | agree. but I think doing gerrit + core plugin cross testing is going to be a _great_ demonstration | 20:34 |
corvus | yeah... i even saw a depends-on :) | 20:35 |
corvus | https://gerrit-review.googlesource.com/c/plugins/replication/+/145851/ | 20:35 |
corvus | is a change from paladox | 20:35 |
mordred | \o/ | 20:35 |
mordred | corvus: https://gerrit.googlesource.com/plugins/zuul/ <-- maybe we should get the gerrit folks to add that to the gerrit :) | 20:35 |
corvus | i was watching the status page, and was pleasantly surprised to see two changes linked | 20:36 |
mordred | corvus: also - maybe we should play with that and consider adding it to our own gerrit installs - the ui showing a needed-by reference seems nice | 20:36 |
corvus | ++ | 20:37 |
mordred | https://gerrit.googlesource.com/plugins/zuul/+/refs/heads/master/src/main/resources/Documentation/rest-api-changes.md | 20:38 |
mordred | corvus: oh - well, the UI stuff is in GWT and hasn't been ported | 20:39 |
corvus | yeah, maybe later :) | 20:39 |
mordred | so there might be more work than just "enable it" | 20:39 |
*** tosky has quit IRC | 20:40 | |
paladox | corvus awesome! | 20:43 |
paladox | mordred yeh, i think i added a plugin endpoint to PG UI for zuul | 20:44 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 20:49 |
paladox | mordred oh, wasn't me that added it (i was just remembering a comment i made): https://gerrit-review.googlesource.com/c/gerrit/+/214632 | 20:49 |
*** tosky has joined #zuul | 20:54 | |
*** rfolco has quit IRC | 20:58 | |
ianw | corvus / Shrews : http://eavesdrop.openstack.org/irclogs/%23openstack-dib/%23openstack-dib.2020-03-25.log.html#t2020-03-25T09:35:48 is talking about dib-run-parts seeming to not have +x permissions. feels suspiciously like our issues yesterday with fake-image-create somehow not being installed +x any more | 21:02 |
*** hashar has quit IRC | 21:06 | |
corvus | ianw: indeed | 21:08 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 21:09 |
*** jamesmcarthur has quit IRC | 21:21 | |
*** jamesmcarthur has joined #zuul | 21:21 | |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 21:23 |
*** jamesmcarthur has quit IRC | 21:26 | |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 21:33 |
*** dpawlik has quit IRC | 21:35 | |
corvus | 2020-03-25 21:36:29,033 WARNING zuul.SQLReporter: [e: 361b9f50ae8c4ee9b33e43eaed6f85d3] SQL reporter (<zuul.driver.sql.sqlreporter.SQLReporter object at 0x7fbc9c1f0cd0>) is disabled | 21:38 |
corvus | i have not encountered that before | 21:38 |
corvus | apparently that happens if something is wrong with the connection or tables... | 21:39 |
*** jamesmcarthur has joined #zuul | 21:41 | |
corvus | it looks like a bunch of pods were restarted 4 days ago; i wonder if the scheduler raced the db starting, and we don't recover from that? | 21:42 |
mordred | corvus: eww. that's not awesome | 21:43 |
corvus | i've restarted gerrit's zuul's scheduler to try to confirm | 21:44 |
corvus | if that is the problem, maybe we can/should fix that by... just not disabling the reporter :) | 21:44 |
corvus | (but we'd still need to make sure that we're at the current migration before reporting) | 21:45 |
mordred | corvus: yeah - also - it seems like being able to exist without disabling the reporter is going to be important when the reporter is required | 21:47 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 22:01 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 22:06 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 22:18 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 22:42 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 22:45 |
corvus | it looks like the scheduler restart fixed that | 22:47 |
corvus | mordred, paladox: and https://gerrit-review.googlesource.com/c/plugins/download-commands/+/259892 has a (green!) build | 22:49 |
paladox | \o/ | 22:49 |
mordred | corvus: woot! | 22:49 |
corvus | paladox: we aren't running any tests yet, do you know if we should run "bazellisk test //..." or something like that? | 22:54 |
corvus | bazel/bazelisk are still somewhat of a mystery to me | 22:54 |
paladox | yes, running that should run the tests. | 22:54 |
corvus | cool, i'll try that out then :) | 22:54 |
paladox | not all plugins have tests though | 22:54 |
mordred | paladox: will bazelisk test break in a plugin if it doesn't have tests? | 22:55 |
openstackgerrit | Merged zuul/zuul-jobs master: upload-logs-swift: Create a download script https://review.opendev.org/592341 | 22:55 |
openstackgerrit | Merged zuul/zuul-jobs master: upload-logs-swift: Add a unicode file https://review.opendev.org/592853 | 22:55 |
mordred | or - if it will - is there a way to tell if a plugin has tests? like a path to look for or something? | 22:55 |
paladox | mordred i'm not sure (haven't tested). I would think it'll pass just like if there are tests. | 22:55 |
mordred | ok. that would be ideal | 22:56 |
corvus | well, this is really the gerrit tests, not the plugin tests | 22:56 |
paladox | ah, yeh that should pickup all the tests | 22:57 |
paladox | *in gerrits core | 22:57 |
mordred | oh right. duh | 22:57 |
corvus | i'm assuming running gerrit's tests are the right thing to do for a core plugin | 22:57 |
corvus | but i dunno :) | 22:57 |
mordred | core plugins | 22:57 |
mordred | corvus: I'd guess so - because in-tree | 22:57 |
paladox | corvus yup | 22:57 |
paladox | since it'll recurse into plugins i think? | 22:58 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 22:58 |
corvus | i sent an update to repo-discuss | 23:07 |
openstackgerrit | Alex Schultz proposed zuul/zuul-jobs master: Improve the run-dstat role https://review.opendev.org/518374 | 23:08 |
paladox | corvus \o/ | 23:10 |
*** y2kenny9 has joined #zuul | 23:18 | |
*** y2kenny has quit IRC | 23:19 | |
*** y2kenny9 has quit IRC | 23:37 | |
*** y2kenny has joined #zuul | 23:37 | |
y2kenny | I noticed an debug message in the executor log: "Ansible output: b"bwrap: Can't mount proc on /newroot/proc: Operation not permitted" Is that something to worry about? | 23:38 |
*** armstrongs has joined #zuul | 23:41 | |
*** armstrongs has quit IRC | 23:50 | |
*** rlandy has quit IRC | 23:51 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!