*** jimi|ansible has joined #zuul | 00:01 | |
mordred | SpamapS: re: markdownlint | 00:16 |
---|---|---|
mordred | SpamapS: the pre playbook runs npm - but there isn't anything ensuring npm exists ... we have an install-nodejs role, but that's for installing from nodesource and might be overkill | 00:17 |
mordred | SpamapS: NEVERMIND - it has been taken care of and also covered in previous review comments | 00:18 |
mordred | SpamapS: maybe one day I'll learn to read. probably not - but maybe | 00:18 |
clarkb | reading is hard | 00:20 |
mordred | clarkb: srrsly | 00:20 |
*** tobiash has quit IRC | 00:21 | |
*** ssbarnea has joined #zuul | 00:22 | |
*** tobiash has joined #zuul | 00:23 | |
mordred | Shrews, corvus: I had this thought about nodepool config ... it happened as I was thinking about the changes to the rate-limiting config that are/will be possible once the current stack finishes landing | 00:27 |
mordred | Shrews, corvus: in the ansible modules, we recently added the ability to pass an entire clouds.yaml style cloud config dict to the cloud parameter | 00:27 |
mordred | so - if it's a scalar, it's a cloud name, but if it's a dict, we pass it in to the constructor as kwargs | 00:27 |
mordred | Shrews, corvus: do you think adding a similar thing to nodepool would be an improvement or just confusing and terrible? | 00:28 |
mordred | (the thing that made me start thinking about this in the first place is that, with the next openstacksdk release, it will be possible to configure rate limits in clouds.yaml - and also to configure them separately per-service | 00:29 |
mordred | I'm not sure whether adding per-service rate-limit support directly to nodepool's config language is a valuable thing though - or whether a passthrough cloud dict is either (we do it in ansible because people want to store their config in vault and whatnot and the clouds.yaml files get confusing in that context) | 00:31 |
mordred | anyway- not urgent or anything - just thoughts happening while dealing with airplanes | 00:31 |
corvus | mordred: but with rate limits going into openstacksdk(/keystone?) wouldn't the most desirable future state be one where we retire the use of rate limit stuff in nodepool and defer that entirely to clouds.yaml? i see this as an opportunity to further simplify nodepool config... | 00:35 |
mordred | corvus: yah - that's probably the right option | 00:36 |
Shrews | i feel like moving that out of nodepool is the clearer thing to do, too | 00:36 |
corvus | mordred: to answer the question another way -- if there's a big win by supporting the scalar-or-dict thing, i think it would work fine, however, right now it strikes me as complexity without gain, and configuring this stuff is hard enough as-is... it'd be an easy sell though if we find the right use case | 00:36 |
mordred | ++ | 00:37 |
mordred | cool. less patches for me to write :) | 00:37 |
Shrews | those were the exact words i had in my head | 00:37 |
Shrews | corvus: quick! what number am i thinking of? | 00:37 |
corvus | Shrews: -2? | 00:38 |
Shrews | sorry, the answer was "blue". i feel safer now | 00:39 |
corvus | Shrews: are you sure it wasn't yellow? | 00:39 |
Shrews | O.o | 00:39 |
Shrews | tin foil hat time | 00:39 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Consume rate limiting task manager from openstacksdk https://review.openstack.org/612169 | 00:46 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Remove task manager https://review.openstack.org/612170 | 00:46 |
*** gouthamr has joined #zuul | 00:47 | |
*** dmellado has joined #zuul | 00:51 | |
*** rlandy has quit IRC | 01:27 | |
dmsimard | I'm trying ansible-runner out of curiosity | 01:27 |
dmsimard | and it's not bad | 01:28 |
*** bhavikdbavishi has joined #zuul | 01:43 | |
*** bhavikdbavishi has quit IRC | 01:50 | |
*** bhavikdbavishi has joined #zuul | 02:23 | |
*** bhavikdbavishi has quit IRC | 02:40 | |
*** bhavikdbavishi has joined #zuul | 03:35 | |
*** mrhillsman is now known as openlab | 04:05 | |
*** openlab is now known as mrhillsman | 04:06 | |
*** rfolco|rover has quit IRC | 04:30 | |
*** spsurya has joined #zuul | 05:48 | |
AJaeger | zuul cores, time to remove an ancient role from zuul-jobs - please review https://review.openstack.org/610381 | 06:21 |
*** bhavikdbavishi1 has joined #zuul | 06:53 | |
*** bhavikdbavishi has quit IRC | 06:55 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 06:55 | |
*** pcaruana has joined #zuul | 06:56 | |
*** themroc has joined #zuul | 07:24 | |
*** sshnaidm|afk is now known as sshnaidm|pto | 07:41 | |
*** electrofelix has joined #zuul | 08:29 | |
*** gouthamr has quit IRC | 08:32 | |
*** dmellado has quit IRC | 08:34 | |
*** cbx33 has joined #zuul | 08:58 | |
*** nilashishc has joined #zuul | 09:09 | |
*** gouthamr has joined #zuul | 09:11 | |
*** cbx33 has quit IRC | 09:17 | |
*** dmellado has joined #zuul | 09:18 | |
*** jpena|off is now known as jpena | 09:22 | |
*** rfolco has joined #zuul | 10:12 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Remove the "emit-ara-html" role https://review.openstack.org/610381 | 10:14 |
*** ssbarnea_ has joined #zuul | 10:16 | |
*** ianychoi has quit IRC | 10:22 | |
*** ianychoi has joined #zuul | 10:25 | |
*** rfolco is now known as rfolco|rucker | 11:23 | |
*** panda is now known as panda|lunch | 11:27 | |
*** jpena is now known as jpena|lunch | 11:33 | |
*** nilashishc has quit IRC | 11:39 | |
*** nilashishc has joined #zuul | 12:19 | |
*** bhavikdbavishi has quit IRC | 12:22 | |
*** rlandy has joined #zuul | 12:29 | |
*** jpena|lunch is now known as jpena | 12:40 | |
corvus | tobiash: do you still see the same error in https://review.openstack.org/597147 ? | 13:43 |
tobiash | corvus: will re-do the check | 13:50 |
openstackgerrit | Merged openstack-infra/zuul master: web: Increase height and padding of zuul-job-result https://review.openstack.org/610980 | 13:51 |
tobiash | corvus: regarding 610029: iterating over the node requests took so long even with caching (we have probably more providers than is good currently in one instance) | 13:52 |
tobiash | but nevertheless we had a very long list of open requests and quota calculation for each request took a few seconds even with znode caching | 13:53 |
tobiash | so I think this safety net is still useful in case of overload situations | 13:53 |
openstackgerrit | Merged openstack-infra/zuul master: encrypt_secret: support OpenSSL 1.1.1 https://review.openstack.org/611414 | 13:54 |
corvus | tobiash: ok. i'm fine merging that and then continuing to work to speed things up. | 13:58 |
tobiash | corvus: thanks, that's also my main focus area atm | 13:59 |
tobiash | (stability and performance) | 14:01 |
goern | tobiash, are you running jobs on VM or in pods? | 14:01 |
tobiash | goern: our jobs run on vms | 14:01 |
tobiash | zuul in pods | 14:01 |
goern | uh, zuul itself in pods :) | 14:02 |
goern | need to talk to tristanC that he puts software-factory in pods :) | 14:02 |
*** panda|lunch is now known as panda | 14:04 | |
* tobiash just increased the zuul-executor pods to 12 | 14:04 | |
goern | and thats a real bottleneck for me right now... the vm running the executors is way to slow :/ | 14:05 |
tobiash | corvus: I cannot reproduce the error in 597147 anymore. I guess the needed change in the api was not there yet in openstack at that time? | 14:10 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Ignore removed provider in _cleanupLeakedInstances https://review.openstack.org/608670 | 14:18 |
corvus | tobiash: that sounds plausible | 14:18 |
tobiash | corvus: so I changed -1 to +2 | 14:19 |
tristanC | goern: well the issue remains that zuul executor needs privileged pods, which iiuc is not acceptable with multi-tenant openshift deployment... | 14:23 |
tobiash | tristanC: true, so I have my own 'single-tenant' openshift deployment | 14:39 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Cleanup node requests that are declined by all current providers https://review.openstack.org/610915 | 14:48 |
goern | tristanC, tobiash do we know why and what privs the executors need? | 15:02 |
tobiash | goern: the executors use bwrap (a sandboxing tool) to jail the jobs | 15:03 |
openstackgerrit | Merged openstack-infra/zuul master: Exclude .keep files from .gitignore https://review.openstack.org/611990 | 15:03 |
openstackgerrit | Merged openstack-infra/zuul master: Add a sanity check for all refs returned by Gerrit https://review.openstack.org/599011 | 15:03 |
tobiash | that needs privs | 15:03 |
openstackgerrit | Merged openstack-infra/zuul master: Reload tenant in case of new project branches https://review.openstack.org/600088 | 15:03 |
goern | tobiash, dont we run jobs in pods?! so why we need bwrap? | 15:04 |
mordred | goern: there is a hypothesis that it could be possible to have bwrap request less capabilities - but to my knowledge nobody has had the time to investigate whether or not that is actually possible | 15:04 |
mordred | goern: we wrap the execution of ansible-playbook, which runs on the executor, in bwrap | 15:04 |
mordred | as a defense in depth - to go along with ansible-level restrictions preventing local code execution | 15:05 |
tobiash | goern: no, the jobs run in the executor pods (and can do localhost stuff) | 15:05 |
goern | oh ja... | 15:06 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Cleanup node requests that are declined by all current providers https://review.openstack.org/610915 | 15:17 |
openstackgerrit | Merged openstack-infra/zuul master: Use merger to get list of files for pull-request https://review.openstack.org/603287 | 15:25 |
openstackgerrit | Merged openstack-infra/zuul master: Add support for authentication/STARTTLS to SMTP https://review.openstack.org/603833 | 15:25 |
openstackgerrit | Merged openstack-infra/zuul master: encrypt_secret: Allow file scheme for public key https://review.openstack.org/581429 | 15:25 |
tristanC | goern: my proposal was to spawn a new executor pod for each job to remove the need for local bwrap isolation, but that's a substantial refactor... | 15:28 |
goern | tristanC, but I think that is the right way to move zuul forward into a cloud native world | 15:29 |
tristanC | there are more details in this thread: http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-July/000477.html | 15:31 |
corvus | yep, if anyone has time to look into the bwrap capabilities question mordred described, that's the next step. | 15:34 |
corvus | tristanC: are you planning on looking at the nodepool k8s functional test failures? it looks like it's catching a real bug in the k8s driver | 15:35 |
*** ianychoi_ has joined #zuul | 15:36 | |
tristanC | corvus: i'm still in vacations atm, and i probably won't have time for that before the summit | 15:37 |
corvus | tristanC: oh vacation! don't worry about it then! no rush. :) | 15:38 |
corvus | tristanC: you're just around so much it didn't seem like you were on vacation. ;) | 15:38 |
*** ianychoi has quit IRC | 15:40 | |
Shrews | corvus: speaking of that, does the direction in https://review.openstack.org/609515 seem sensible to you? | 15:48 |
Shrews | as far as organizing the tox tests | 15:48 |
Shrews | tl;dr.... current tests -> tests/unit/ , driver func tests -> tests/functional/<driver>/ | 15:50 |
Shrews | func tests will be a separate job per driver. didn't make sense to have a single job to set up *all* of the potential backends on a single node | 15:53 |
clarkb | The first paragraph of https://github.com/projectatomic/bubblewrap#user-namespaces is the important part re privilege vs unprivileged bwrap aiui | 15:55 |
clarkb | I think if your container provider considered user namespaces secure for unprivileged users then bwrap would run fine | 15:55 |
clarkb | for example on my tumbleweed machine where non root without setuid can run brwap | 15:55 |
corvus | Shrews: that looks great | 15:57 |
*** themroc has quit IRC | 16:00 | |
*** rfolco|rucker is now known as rfolco|brb | 16:01 | |
tobiash | goern: btw, nodepool builder also needs privs | 16:01 |
openstackgerrit | Merged openstack-infra/zuul master: web: add config-errors notifications drawer https://review.openstack.org/597147 | 16:07 |
Shrews | Has anyone else noticed that zk in our tests seems to be getting less and less reliable? | 16:39 |
Shrews | at least in nodepool. not sure about zuul | 16:40 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: DNM: testing zookeeper oddities https://review.openstack.org/612750 | 16:46 |
clarkb | Shrews: ya on theory I had was maybe running 8 test processes didn't give zookeeper time to get the cpu | 16:47 |
clarkb | not easy to test that, but did push up a change that runs numcpu - 1 test processes | 16:47 |
*** caphrim007 has joined #zuul | 16:48 | |
Shrews | clarkb: well, the case I just looked at, zookeeper connection was established, but then it was suspended for whatever reason | 16:48 |
Shrews | maybe still cpu issue? dunno | 16:49 |
clarkb | Could that happen if your connection timesout? | 16:49 |
clarkb | I'm not sure how zk handles a timed out connection when you run without responding for too many ticks | 16:49 |
Shrews | maybe. but it was less than 2 seconds between connection and suspension | 16:51 |
Shrews | http://logs.openstack.org/98/605898/2/gate/tox-py36/9d5cf96/testr_results.html.gz | 16:51 |
Shrews | for reference | 16:51 |
corvus | zuul tests have been having issues too, which also are consistent with a cpu contention hypothesis | 16:52 |
Shrews | clarkb: where was your change you referenced? | 16:52 |
clarkb | https://review.openstack.org/#/c/561037/ | 16:52 |
Shrews | does that still work with stestr? | 16:53 |
clarkb | it should, but likely have to move the config to the stestr config file. /me reads some docs | 16:53 |
tobiash | Shrews, corvus: do you think it's cpu or io contention? | 16:54 |
Shrews | tobiash: we don't know what it is, thus our speculative poking :) | 16:55 |
clarkb | https://stestr.readthedocs.io/en/latest/MANUAL.html#user-config-files it may not dynamically evaluate it as shell though | 16:55 |
clarkb | it is configurable but only to an integer value not a run this command set the value value | 16:56 |
Shrews | we could also do the command line option i suppose | 16:57 |
tobiash | I'm probably a little bit biased towards io since I'm mostly dealing with io bottlenecks here | 16:57 |
clarkb | check the dstat addition to zuul jobs change? | 16:58 |
clarkb | fwiw it did look like there was disk io contention on ovh but not the other clouds | 16:58 |
Shrews | hrm, this was an ovh node | 16:59 |
clarkb | I think ovh limits by iops | 16:59 |
clarkb | so lots of small writes (probably what zk is doing actually) are "slow" but big writes are fast | 16:59 |
Shrews | but a similar failure on rax | 17:00 |
caphrim007 | are there any zuul services which *dont* require the zuul.conf? | 17:02 |
tobiash | caphrim007: no, but they require different parts of the zuul conf | 17:03 |
caphrim007 | tobiash: thanks! | 17:04 |
*** gothicmindfood has quit IRC | 17:05 | |
tobiash | Shrews, corvus, clarkb: the dstat of zuul looks more iops bound than cpu bound | 17:07 |
corvus | caphrim007: (it's designed so you can use the same conf file everywhere) | 17:07 |
tobiash | http://logs.openstack.org/00/610100/3/check/tox-py36/6a8a11d/dstat.html.gz | 17:08 |
caphrim007 | corvus: roger that | 17:08 |
clarkb | tobiash: at least for that particular job I don't think it is, if you scroll the window to the left you'll see there is a spike in iops | 17:09 |
clarkb | tobiash: and then the rest of it runs under that spike | 17:09 |
clarkb | implying we don't hit a limit there | 17:09 |
*** gothicmindfood has joined #zuul | 17:09 | |
clarkb | at the same time the load average is very high for an 8vcpu host | 17:10 |
clarkb | maybe its both things! | 17:10 |
tobiash | clarkb: but cpu is below 100% constantly | 17:11 |
clarkb | ya | 17:11 |
Shrews | if it isn't cpu, we'll have to look at setting up zookeeper with a tmpfs for our tests. but that's the harder thing to do, so let's try the cpu thing first | 17:11 |
clarkb | another thing I notice is that there are a lot of sockets open. Is it possible we are running into ulimit errors? | 17:11 |
tobiash | clarkb: and the load is a combi ation of cpu and io on linux | 17:11 |
tobiash | That also might be a problem | 17:12 |
clarkb | tobiash: the wai cpu time should indicate waiting on io (or other syscalls) right? | 17:12 |
tobiash | Not neccessarily | 17:12 |
clarkb | I guess if you are running async it wouldn't show up there? | 17:12 |
clarkb | beacuse you are polling | 17:12 |
corvus | it should as long as there is some idle time (and there is), so it should be fairly reliable indication of iowait | 17:13 |
corvus | (if there's no idle time, you can't rely on iowait) | 17:13 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: DNM: testing zookeeper oddities https://review.openstack.org/612750 | 17:13 |
clarkb | trying to accomodate the number of sockets/fds is probably worth doing anyway to avoid ulimit problems on say your laptop | 17:14 |
tobiash | But nevertheless it's very io heavy as it's doing constantly brtween 500 and 1000 iops | 17:16 |
*** rfolco|brb is now known as rfolco|rucker | 17:16 | |
clarkb | yes | 17:16 |
Shrews | those graphs would be handy to have for nodepool | 17:17 |
Shrews | how is that enabled? | 17:19 |
clarkb | I think its an unmerged change to the zuul tox job to add it? | 17:19 |
tobiash | Shrews: https://review.openstack.org/#/c/610100 | 17:19 |
clarkb | fwiw running zk on a tmpfs may not be too difficult since we use the tools/test-setup.sh method of running zk | 17:19 |
clarkb | I'll work on making that happen | 17:20 |
Shrews | clarkb: that isn't used in nodepool | 17:21 |
Shrews | we should actually remove that | 17:21 |
tobiash | Maybe in a pre playbook like dstat? | 17:21 |
Shrews | oh, np doesn't have that | 17:21 |
clarkb | Shrews: remote the test setup? that is how mysql is configured | 17:22 |
clarkb | Shrews: and that is why nodepool probably doesn't have it | 17:22 |
clarkb | I don't think you can remove it from zuul if you want to test the mysql reporting driver | 17:22 |
Shrews | clarkb: i was thinking of only nodepool | 17:22 |
corvus | do folks like the dstat thing? should we start thinking about how to do it for realz? | 17:22 |
Shrews | but looking in zuul code (duh) | 17:23 |
Shrews | corvus: seems handy for this use case, at least | 17:23 |
tobiash | corvus: we have something like this in the base job that cam be enabled by a job var | 17:25 |
corvus | if we want the data for nodepool, the most expiditious thing would just be to copy that change to the nodepool repo for now... i can do that real quick | 17:26 |
corvus | tobiash: what do you use to generate the report? | 17:26 |
tobiash | We use sar for gathering and 'sadf -g' for generati g svg graphs | 17:26 |
corvus | i really like the thing in my change because it's all static js that just gets concatenated into a single file. it's very simple and self-contained. however, it isn't published anywhere, so we'd have to figure out a distribution mechanism | 17:27 |
corvus | tobiash: is that something you could share? | 17:27 |
tobiash | Of course | 17:27 |
Shrews | corvus: thx. i can rebase on top of that change | 17:27 |
tobiash | corvus: I can share that later (not at laptop atm) | 17:29 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool master: WIP: Run dstat and generate graphs in unit tests https://review.openstack.org/612765 | 17:30 |
corvus | Shrews: ^ | 17:30 |
corvus | tobiash: cool, thanks | 17:30 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: DNM: testing zookeeper oddities https://review.openstack.org/612750 | 17:31 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul master: Run zookeeper datadir on tmpfs during testing https://review.openstack.org/612766 | 17:33 |
clarkb | it isn't too bad to use a tmpfs ^ if we want to go that route | 17:33 |
clarkb | could also look into eatmydata and other write() fast returners | 17:33 |
*** jpena is now known as jpena|off | 17:34 | |
clarkb | Shrews: if ^ resutls in more stability we could do similar for nodepool | 17:34 |
Shrews | not quite as simple in nodepool, but yeah. hope it produces good results | 17:35 |
corvus | Shrews: what sets up zk in the unit tests in nodepool? | 17:36 |
Shrews | corvus: nothing. it's installed from bindep | 17:36 |
Shrews | so we assume it's running | 17:37 |
corvus | Shrews: ah gotcha | 17:37 |
corvus | Shrews: i think that's the same for zuul, so it might just be a matter of adding a test-setup.sh script with the sed in it | 17:38 |
Shrews | corvus: yeah, that's what i was thinking | 17:38 |
Shrews | corvus: where is that script called in zuul? | 17:39 |
clarkb | it is part of the base tox jobs iirc | 17:39 |
corvus | ya | 17:39 |
corvus | so if it exists, it'll get run in tox-pyxx automatically | 17:39 |
Shrews | oh, that's convenient | 17:40 |
Shrews | so not so bad then | 17:40 |
clarkb | we might want to add noatime to that set of mount options too | 17:40 |
Shrews | clarkb: ++ | 17:40 |
clarkb | lets see if that actually works (I'm slightly worried about file permissions but the mount that happened locally for me seemed to just work | 17:42 |
Shrews | ok, i've got an equivalent change for nodepool, but i'm not going to push it up just ye | 17:46 |
Shrews | yet | 17:46 |
*** nilashishc has quit IRC | 17:59 | |
SpamapS | Hey everyone, I'm starting to poke at converting my kubernetes yamls for zuul+nodepool (based on tobiash's openshift submission) into helm charts. Just wondering if anybody has headed down that route before I go there. | 18:01 |
*** chandankumar is now known as chkumar|off | 18:37 | |
*** panda has quit IRC | 18:45 | |
*** panda has joined #zuul | 18:45 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Merger: automatically add new hosts to the known_hosts file https://review.openstack.org/608453 | 18:57 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Merger: automatically add new hosts to the known_hosts file https://review.openstack.org/608453 | 18:59 |
*** jtanner has joined #zuul | 19:19 | |
tobiash | corvus: do you still use xenail nodes in openstack? | 19:22 |
clarkb | tobiash: yes we run zuul on xenial and many of the tests still run on xenial | 19:23 |
*** pcaruana has quit IRC | 19:23 | |
tobiash | corvus: I'm asking because the svg generation of sadf needs sysstat at least in version 11.4.0 which is too old in xenial (I think 16.10 was the first that shipped sysstat in a version that can generate svg) | 19:24 |
tobiash | so we're currently running the svg generation in docker in an alpine image ;) | 19:24 |
tobiash | so to use that in openstack we'd maybe to host the sadf binary somewhere (or add it to the xenial nodes in a newer version) | 19:25 |
tobiash | or use a dfferent method of svg generation | 19:25 |
corvus | tobiash: i think it's also ok if it only works on newer systems | 19:26 |
tobiash | ok, that makes it easier | 19:26 |
*** openstackgerrit has quit IRC | 20:06 | |
*** caphrim007 has quit IRC | 20:30 | |
SpamapS | corvus: how would you feel about adding something to zuul that lets it read any config option from envvars (constructed from section+key)? Asking because that makes kubernetes deployments of zuul a lot simpler. | 20:32 |
SpamapS | I had to jump through a lot of hoops to get secrets into the containers ... porting that to helm chart is shining a light on how this could be made quite a bit simpler. | 20:32 |
corvus | SpamapS: to be honest, that sounds terrible -- like, we must be missing something. surely kubernets can handle running apps with *config files* ? | 20:33 |
SpamapS | Nope. | 20:36 |
SpamapS | It can handle config *file* | 20:36 |
*** ssbarnea_ has quit IRC | 20:36 | |
dmsimard | SpamapS: you can set environment for Ansible tasks, you don't need Zuul for that | 20:36 |
SpamapS | But when you have some bits coming from config maps, some from secrets, and others from other deployment pieces, you have to assemble that config file in a very frustrating way. | 20:36 |
SpamapS | dmsimard: I am setting things like the github app secret. | 20:37 |
SpamapS | Ansible isn't even in the picture yet | 20:37 |
SpamapS | Or the mysql db password. | 20:37 |
SpamapS | Now | 20:37 |
corvus | (to elaborate on why it rubs me the wrong way -- i feel like we're finally getting to the point where we can somewhat concisely instruct people on how to set up zuul, and forking that process so there are two completely different ways of configuring zuul is counter-productive. being able to talk with people about "the zuul config file" and not have that be a mystery depending on the deployment tech would be | 20:37 |
corvus | great) | 20:37 |
SpamapS | The problem is that in order to build that file, you have to pull things from many different sources. | 20:38 |
SpamapS | We can also just make a wrapper that does what I described. | 20:38 |
SpamapS | Or, we can enable ConfigParser environment interpolation. | 20:38 |
SpamapS | Which I discovered after asking that question.. just now. ;) | 20:38 |
dmsimard | SpamapS: this would not work ? https://gist.github.com/dmsimard/7e8753b252de7cc9380c2b4d5ad2f6f9 | 20:39 |
SpamapS | http://paste.openstack.org/show/732855/ <-- this patch actually makes it so you can reference the environment in zuul.conf | 20:39 |
SpamapS | with %(ENV_VAR_NAME)s | 20:39 |
SpamapS | so maybe that's a happy medium? | 20:40 |
SpamapS | since you still have "a zuul config file" | 20:40 |
corvus | SpamapS: i see the problem you describe; that solution soulds like maybe a good compromise | 20:40 |
SpamapS | but you can feed variable things in via the environment | 20:40 |
corvus | yeah, it's a bit more explicit and less magic -- it should be pretty easy to understand/debug | 20:40 |
SpamapS | Indeed, and doesn't change by way of deployment tool. | 20:41 |
corvus | ++ | 20:41 |
dmsimard | SpamapS: oh, it's not for a job, it's for actually deploying zuul ? | 20:41 |
SpamapS | reading up on the caveats now | 20:41 |
SpamapS | since I literally just learned about this 120 seconds ago. | 20:41 |
SpamapS | dmsimard: correct | 20:41 |
dmsimard | ok, my bad, I had the job use case in mind | 20:41 |
SpamapS | Yeah, and I'm explicitly avoiding ansible for any of it just to avoid ansibleception. | 20:42 |
dmsimard | fair :) | 20:42 |
dmsimard | so you're using puppet? jk | 20:42 |
SpamapS | (though it looks like ansible+k8s should be a lot simpler in ansible 2.7) | 20:42 |
dmsimard | yeah.. the awx installer uses ansible to deploy itself in k8s/openshift | 20:43 |
clarkb | SpamapS: fwiw https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#add-configmap-data-to-a-volume implies you could write multiple config files | 20:43 |
clarkb | the value is the file content an the key the filename at that volume mount point | 20:43 |
SpamapS | clarkb: configmaps and secrets cannot land in the same dir, nor can they be symlinked reliably. | 20:46 |
SpamapS | which is exactly the frustration I landed on | 20:46 |
SpamapS | zuul.conf ended up having to be a secret entirely. | 20:46 |
clarkb | how are you suopposed to use secrets with config | 20:46 |
clarkb | thats seems broken | 20:46 |
SpamapS | YES | 20:46 |
SpamapS | secrets are carefully handled in such a way that they are very hard to compromise accidentally and don't end up on disk ever. | 20:47 |
SpamapS | but that makes them a bit rigid. | 20:47 |
SpamapS | I believe you can probably figure out a way to make something like zuul.conf in one dir, and zuul-secure.conf in the same dir. | 20:48 |
dmsimard | secrets are also very securely "encrypted" in base64 in etcd :( | 20:48 |
SpamapS | but.. I'm just thinking, I kind of like the idea of just sticking them in the environment. | 20:48 |
clarkb | dmsimard: and etcd doesn't have read acls (or did v3 add that?) | 20:48 |
dmsimard | not sure, openshift has acls for sure but I'm not sure if that comes from etcd or k8s | 20:48 |
clarkb | SpamapS: ya if that works and configparser supports it sanely seems like a reaosnable approach | 20:49 |
SpamapS | etcd3 does in fact have RBAC | 20:49 |
clarkb | nice | 20:49 |
SpamapS | but IIRC the recommendation is to replace that secret storage with something better | 20:50 |
SpamapS | https://kubernetes.io/docs/concepts/configuration/secret/#protections | 20:53 |
SpamapS | FYI, if interested | 20:53 |
*** openstackgerrit has joined #zuul | 20:54 | |
openstackgerrit | Merged openstack-infra/zuul master: Run zookeeper datadir on tmpfs during testing https://review.openstack.org/612766 | 20:54 |
clarkb | I guess ^ worked thats neat | 20:54 |
SpamapS | and looks like they have a path to encrypted-at-rest secrets | 20:54 |
SpamapS | https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/ | 20:55 |
*** manjeets has joined #zuul | 21:01 | |
SpamapS | hm.. so with a few more lines of code, instead of doing %(ENV_VAR)s, we could instead have $ENV_VAR and ${ENV_VAR} work.. | 21:01 |
SpamapS | the latter would be less surprising. Like, people might just try that without reading the docs. | 21:02 |
*** caphrim00_ has joined #zuul | 21:03 | |
*** caphrim00_ has quit IRC | 21:07 | |
*** caphrim007 has joined #zuul | 21:08 | |
caphrim007 | hey folks, can i ask nodepool questions here? or is that another channel? | 21:09 |
corvus | SpamapS: that could be a thing | 21:09 |
corvus | caphrim007: this is the place | 21:09 |
caphrim007 | corvus: thanks. had a question. if I have a min-ready of 1 in my nodepool.conf and an existing node created in my openstack. i manually deleted the node in openstack, and expected that nodepool would rectify that. was that an incorrect assumption? didnt seem to "work" until i did nodepool delete foo | 21:10 |
caphrim007 | the existing node had been created via nodepool btw | 21:11 |
corvus | caphrim007: ah nope, i don't think nodepool will detect that (it doesn't expect nodes to be deleted from under it) | 21:12 |
caphrim007 | ahh ok. thanks for the clarification! | 21:12 |
corvus | caphrim007: but nodepool delete will delete the node in nodepool and openstack | 21:12 |
clarkb | it should eventually cleanup the node from its db (and in the cloud if necessary) after the ready timeout is reached though | 21:12 |
clarkb | but that is a long timeout by default | 21:12 |
corvus | clarkb: oh, right, that's true. but that's like 8+ hours i think | 21:13 |
clarkb | (8hours?) | 21:13 |
corvus | it's more likely to just try to use the node and fail before then | 21:13 |
caphrim007 | ok. that brings me to another question actually corvus. how does zuul instruct nodepool to create something? does it use an api for that? because i see no thing like "nodepool create foo" | 21:13 |
corvus | caphrim007: yes, zuul puts requests into zookeeper and nodepool handles them. there isn't an external interface to this api yet (probably some day, but we still consider it a private api for now while we iron it out) | 21:14 |
corvus | caphrim007: you can inspect it with 'nodepool request-list' | 21:14 |
corvus | that will show any pending requests from zuul | 21:15 |
caphrim007 | oh, interesting | 21:15 |
manjeets | Hello zuul community I'm trying to set up a 3rd party job for an opensource project one of openstack projects, and found this https://zuul-ci.org/docs/zuul/admin/quick-start.html#quick-start | 21:17 |
manjeets | I can disable gerrit service from this and configure it to point to gerrit for opensource project ? | 21:17 |
corvus | caphrim007: here's the current output from openstack's nodepool if you are curious: http://paste.openstack.org/show/732858/ | 21:18 |
corvus | manjeets: yes, you should be able to do that. the zuul.conf file is bind-mounted into the container, so you can just edit it in your local directory. you can remove the gerrit container from the docker-compose file to avoid running it. | 21:20 |
corvus | manjeets: note that it will levae "localhost" links for the logs, so you will also need to change the log url to something that other people can access. | 21:21 |
manjeets | corvus thanks, should i just remove gerrit from docker compose or I have to delete gerrit-config tag as well, If I understood correctly i'll link gerrit stream of upstream patches in zuul.conf ? | 21:22 |
corvus | caphrim007: i forgot (until i pasted that output) that even min-ready nodes go through the request system, so you should be able to see that too. | 21:22 |
caphrim007 | corvus: alrighty. thanks! | 21:23 |
corvus | manjeets: you can remove gerrit-config as well | 21:23 |
caphrim007 | corvus: is main.yaml used by all the zuul components too? or only select ones? | 21:23 |
corvus | manjeets: and yes, you can update zuul.conf to connect to an upstream gerrit instead of the local one. | 21:23 |
corvus | caphrim007: it's only used by the scheduler to bootstrap the rest of the configuration | 21:24 |
caphrim007 | kk | 21:24 |
caphrim007 | ahh yeah. there it is in zuul.conf. duh tim | 21:24 |
manjeets | corvus thanks I'm building that right now might come back to ask about issue if I run into any ! | 21:24 |
corvus | manjeets: great, we're happy to help -- when you're done, maybe you can share your configuration -- other folks may be able to use it :) | 21:25 |
manjeets | corvus sure ! once I'm done I'll write documentation and make it public | 21:26 |
caphrim007 | corvus: is there a reference/code/anything that i can look at that covers the zuul.conf options? | 21:29 |
corvus | caphrim007: yes, i think they should all be covered on this page: https://zuul-ci.org/docs/zuul/admin/components.html | 21:33 |
corvus | caphrim007: (under each individual section) | 21:34 |
caphrim007 | corvus: ahh ok. is this config here, zuul_url, no longer a thing? https://github.com/openstack/windmill/blob/306602fc0c267837e2a4af68e510e1e7b705871b/config/zuul/zuul.conf.j2#L42 | 21:37 |
corvus | caphrim007: correct -- i think that was for zuul v2 | 21:38 |
caphrim007 | k | 21:38 |
*** spsurya has quit IRC | 21:38 | |
corvus | caphrim007: oh, there's also a little more zuul.conf option documentation in the drivers pages: https://zuul-ci.org/docs/zuul/admin/connections.html#drivers | 21:38 |
corvus | caphrim007: for example, to configure a github connection, here are the docs: https://zuul-ci.org/docs/zuul/admin/drivers/github.html#connection-configuration | 21:39 |
corvus | (those are in separate files -- one per driver -- since the drivers are supposed to be self-contained) | 21:39 |
caphrim007 | ahh yes, right right. i'm just doing some rectifying between windmill and what i see in the current zuul docs | 21:40 |
clarkb | ianw: if you have a sec can you review https://review.openstack.org/#/c/609829/5 ? it adds the port cleanups to nodepool itself | 21:45 |
openstackgerrit | Clark Boylan proposed openstack-infra/nodepool master: Run test zookeeper on top of tmpfs https://review.openstack.org/612816 | 21:49 |
clarkb | Shrews: corvus ^ didn't see one of those get pushed yet so went ahead and did it atop the dstat change to see if we notice a difference | 21:49 |
clarkb | looking at http://logs.openstack.org/16/612816/1/check/tox-py36/f346739/dstat.html (tmpfs) vs http://logs.openstack.org/65/612765/1/check/tox-py36/d2d81c3/dstat.html (not tmpfs) we do drastically reduce the iops | 21:58 |
clarkb | whether or not that has a hand in making the tmpfs change pass vs the failure in the not tmpfs run hard to say | 21:58 |
clarkb | the dstat info for the not tmpfs run doesn't actually look all the bad | 21:58 |
clarkb | load is low, plenty of cpu idle time, low memory usage etc | 22:00 |
ianw | clarkb: i will take a closer look today. i was wondering if this is actually something nodepool should work around, or if it was a very specific problem | 22:06 |
clarkb | ianw: we've seen it on other clouds too (like packethost recently, but also hpcloud way back when iirc) | 22:07 |
clarkb | I expect it will be a useful thing to have nodepool understand :/ | 22:07 |
ianw | i guess "this shouldn't happen but does" is the raison d'ĂȘtre of openstacksdk, and nodepool to some extent | 22:08 |
clarkb | http://logs.openstack.org/16/612816/1/check/tox-py35/70a3157/job-output.txt.gz I think that rules out iowait as the cause of the zk problems in nodepool test suite | 22:09 |
clarkb | I wonder what our ulimit is there | 22:09 |
*** caphrim007 has quit IRC | 22:21 | |
clarkb | reading the failed test logs again the issue is in wait_for_threads | 22:32 |
clarkb | we actually do build the image and boot a node that we are waiting on but there must be some unexpected background thread running that holds us up | 22:33 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add the process environment to zuul.conf parser https://review.openstack.org/612824 | 22:47 |
clarkb | ok I've tried to reproduce that locally and am having a very hard time. Seems to be fine from here | 22:56 |
clarkb | ran that specific test ~30 times and now running the full test suite to see if it is an interaction between test threads | 22:57 |
*** threestrands has joined #zuul | 23:02 | |
openstackgerrit | Clark Boylan proposed openstack-infra/nodepool master: Do not merge https://review.openstack.org/612828 | 23:08 |
clarkb | I really dislike ^ but unsure of whereelse to look since local reproduction isn't working | 23:08 |
*** rlandy is now known as rlandy|bbl | 23:17 | |
clarkb | at least it caught one | 23:24 |
clarkb | ok I think the real-cloud thread is leaking across tests | 23:26 |
clarkb | I'm going to guess this is a side effect of the openstacksdk release that happened | 23:29 |
clarkb | because openstacksdk is going to run a thread for the api request throttling? | 23:29 |
ianw | clarkb: so you're saying the new thread hasn't been skipped in the wait? | 23:51 |
clarkb | ianw: yes, though reading our task manager and nodepool fixtures I expect that this thread should be stopped | 23:56 |
clarkb | ianw: now that I have that info I'm tracking to hack togther a reproduction locally by running the sdk integration test before the webapp test | 23:57 |
clarkb | currently trying to figure out how to enforce test order | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!