Tuesday, 2018-10-23

*** jimi\|ansible has joined #zuul		00:01
mordred	SpamapS: re: markdownlint	00:16
mordred	SpamapS: the pre playbook runs npm - but there isn't anything ensuring npm exists ... we have an install-nodejs role, but that's for installing from nodesource and might be overkill	00:17
mordred	SpamapS: NEVERMIND - it has been taken care of and also covered in previous review comments	00:18
mordred	SpamapS: maybe one day I'll learn to read. probably not - but maybe	00:18
clarkb	reading is hard	00:20
mordred	clarkb: srrsly	00:20
*** tobiash has quit IRC		00:21
*** ssbarnea has joined #zuul		00:22
*** tobiash has joined #zuul		00:23
mordred	Shrews, corvus: I had this thought about nodepool config ... it happened as I was thinking about the changes to the rate-limiting config that are/will be possible once the current stack finishes landing	00:27
mordred	Shrews, corvus: in the ansible modules, we recently added the ability to pass an entire clouds.yaml style cloud config dict to the cloud parameter	00:27
mordred	so - if it's a scalar, it's a cloud name, but if it's a dict, we pass it in to the constructor as kwargs	00:27
mordred	Shrews, corvus: do you think adding a similar thing to nodepool would be an improvement or just confusing and terrible?	00:28
mordred	(the thing that made me start thinking about this in the first place is that, with the next openstacksdk release, it will be possible to configure rate limits in clouds.yaml - and also to configure them separately per-service	00:29
mordred	I'm not sure whether adding per-service rate-limit support directly to nodepool's config language is a valuable thing though - or whether a passthrough cloud dict is either (we do it in ansible because people want to store their config in vault and whatnot and the clouds.yaml files get confusing in that context)	00:31
mordred	anyway- not urgent or anything - just thoughts happening while dealing with airplanes	00:31
corvus	mordred: but with rate limits going into openstacksdk(/keystone?) wouldn't the most desirable future state be one where we retire the use of rate limit stuff in nodepool and defer that entirely to clouds.yaml? i see this as an opportunity to further simplify nodepool config...	00:35
mordred	corvus: yah - that's probably the right option	00:36
Shrews	i feel like moving that out of nodepool is the clearer thing to do, too	00:36
corvus	mordred: to answer the question another way -- if there's a big win by supporting the scalar-or-dict thing, i think it would work fine, however, right now it strikes me as complexity without gain, and configuring this stuff is hard enough as-is... it'd be an easy sell though if we find the right use case	00:36
mordred	++	00:37
mordred	cool. less patches for me to write :)	00:37
Shrews	those were the exact words i had in my head	00:37
Shrews	corvus: quick! what number am i thinking of?	00:37
corvus	Shrews: -2?	00:38
Shrews	sorry, the answer was "blue". i feel safer now	00:39
corvus	Shrews: are you sure it wasn't yellow?	00:39
Shrews	O.o	00:39
Shrews	tin foil hat time	00:39
openstackgerrit	Monty Taylor proposed openstack-infra/nodepool master: Consume rate limiting task manager from openstacksdk https://review.openstack.org/612169	00:46
openstackgerrit	Monty Taylor proposed openstack-infra/nodepool master: Remove task manager https://review.openstack.org/612170	00:46
*** gouthamr has joined #zuul		00:47
*** dmellado has joined #zuul		00:51
*** rlandy has quit IRC		01:27
dmsimard	I'm trying ansible-runner out of curiosity	01:27
dmsimard	and it's not bad	01:28
*** bhavikdbavishi has joined #zuul		01:43
*** bhavikdbavishi has quit IRC		01:50
*** bhavikdbavishi has joined #zuul		02:23
*** bhavikdbavishi has quit IRC		02:40
*** bhavikdbavishi has joined #zuul		03:35
*** mrhillsman is now known as openlab		04:05
*** openlab is now known as mrhillsman		04:06
*** rfolco\|rover has quit IRC		04:30
*** spsurya has joined #zuul		05:48
AJaeger	zuul cores, time to remove an ancient role from zuul-jobs - please review https://review.openstack.org/610381	06:21
*** bhavikdbavishi1 has joined #zuul		06:53
*** bhavikdbavishi has quit IRC		06:55
*** bhavikdbavishi1 is now known as bhavikdbavishi		06:55
*** pcaruana has joined #zuul		06:56
*** themroc has joined #zuul		07:24
*** sshnaidm\|afk is now known as sshnaidm\|pto		07:41
*** electrofelix has joined #zuul		08:29
*** gouthamr has quit IRC		08:32
*** dmellado has quit IRC		08:34
*** cbx33 has joined #zuul		08:58
*** nilashishc has joined #zuul		09:09
*** gouthamr has joined #zuul		09:11
*** cbx33 has quit IRC		09:17
*** dmellado has joined #zuul		09:18
*** jpena\|off is now known as jpena		09:22
*** rfolco has joined #zuul		10:12
openstackgerrit	Merged openstack-infra/zuul-jobs master: Remove the "emit-ara-html" role https://review.openstack.org/610381	10:14
*** ssbarnea_ has joined #zuul		10:16
*** ianychoi has quit IRC		10:22
*** ianychoi has joined #zuul		10:25
*** rfolco is now known as rfolco\|rucker		11:23
*** panda is now known as panda\|lunch		11:27
*** jpena is now known as jpena\|lunch		11:33
*** nilashishc has quit IRC		11:39
*** nilashishc has joined #zuul		12:19
*** bhavikdbavishi has quit IRC		12:22
*** rlandy has joined #zuul		12:29
*** jpena\|lunch is now known as jpena		12:40
corvus	tobiash: do you still see the same error in https://review.openstack.org/597147 ?	13:43
tobiash	corvus: will re-do the check	13:50
openstackgerrit	Merged openstack-infra/zuul master: web: Increase height and padding of zuul-job-result https://review.openstack.org/610980	13:51
tobiash	corvus: regarding 610029: iterating over the node requests took so long even with caching (we have probably more providers than is good currently in one instance)	13:52
tobiash	but nevertheless we had a very long list of open requests and quota calculation for each request took a few seconds even with znode caching	13:53
tobiash	so I think this safety net is still useful in case of overload situations	13:53
openstackgerrit	Merged openstack-infra/zuul master: encrypt_secret: support OpenSSL 1.1.1 https://review.openstack.org/611414	13:54
corvus	tobiash: ok. i'm fine merging that and then continuing to work to speed things up.	13:58
tobiash	corvus: thanks, that's also my main focus area atm	13:59
tobiash	(stability and performance)	14:01
goern	tobiash, are you running jobs on VM or in pods?	14:01
tobiash	goern: our jobs run on vms	14:01
tobiash	zuul in pods	14:01
goern	uh, zuul itself in pods :)	14:02
goern	need to talk to tristanC that he puts software-factory in pods :)	14:02
*** panda\|lunch is now known as panda		14:04
* tobiash just increased the zuul-executor pods to 12		14:04
goern	and thats a real bottleneck for me right now... the vm running the executors is way to slow :/	14:05
tobiash	corvus: I cannot reproduce the error in 597147 anymore. I guess the needed change in the api was not there yet in openstack at that time?	14:10
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Ignore removed provider in _cleanupLeakedInstances https://review.openstack.org/608670	14:18
corvus	tobiash: that sounds plausible	14:18
tobiash	corvus: so I changed -1 to +2	14:19
tristanC	goern: well the issue remains that zuul executor needs privileged pods, which iiuc is not acceptable with multi-tenant openshift deployment...	14:23
tobiash	tristanC: true, so I have my own 'single-tenant' openshift deployment	14:39
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Cleanup node requests that are declined by all current providers https://review.openstack.org/610915	14:48
goern	tristanC, tobiash do we know why and what privs the executors need?	15:02
tobiash	goern: the executors use bwrap (a sandboxing tool) to jail the jobs	15:03
openstackgerrit	Merged openstack-infra/zuul master: Exclude .keep files from .gitignore https://review.openstack.org/611990	15:03
openstackgerrit	Merged openstack-infra/zuul master: Add a sanity check for all refs returned by Gerrit https://review.openstack.org/599011	15:03
tobiash	that needs privs	15:03
openstackgerrit	Merged openstack-infra/zuul master: Reload tenant in case of new project branches https://review.openstack.org/600088	15:03
goern	tobiash, dont we run jobs in pods?! so why we need bwrap?	15:04
mordred	goern: there is a hypothesis that it could be possible to have bwrap request less capabilities - but to my knowledge nobody has had the time to investigate whether or not that is actually possible	15:04
mordred	goern: we wrap the execution of ansible-playbook, which runs on the executor, in bwrap	15:04
mordred	as a defense in depth - to go along with ansible-level restrictions preventing local code execution	15:05
tobiash	goern: no, the jobs run in the executor pods (and can do localhost stuff)	15:05
goern	oh ja...	15:06
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Cleanup node requests that are declined by all current providers https://review.openstack.org/610915	15:17
openstackgerrit	Merged openstack-infra/zuul master: Use merger to get list of files for pull-request https://review.openstack.org/603287	15:25
openstackgerrit	Merged openstack-infra/zuul master: Add support for authentication/STARTTLS to SMTP https://review.openstack.org/603833	15:25
openstackgerrit	Merged openstack-infra/zuul master: encrypt_secret: Allow file scheme for public key https://review.openstack.org/581429	15:25
tristanC	goern: my proposal was to spawn a new executor pod for each job to remove the need for local bwrap isolation, but that's a substantial refactor...	15:28
goern	tristanC, but I think that is the right way to move zuul forward into a cloud native world	15:29
tristanC	there are more details in this thread: http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-July/000477.html	15:31
corvus	yep, if anyone has time to look into the bwrap capabilities question mordred described, that's the next step.	15:34
corvus	tristanC: are you planning on looking at the nodepool k8s functional test failures? it looks like it's catching a real bug in the k8s driver	15:35
*** ianychoi_ has joined #zuul		15:36
tristanC	corvus: i'm still in vacations atm, and i probably won't have time for that before the summit	15:37
corvus	tristanC: oh vacation! don't worry about it then! no rush. :)	15:38
corvus	tristanC: you're just around so much it didn't seem like you were on vacation. ;)	15:38
*** ianychoi has quit IRC		15:40
Shrews	corvus: speaking of that, does the direction in https://review.openstack.org/609515 seem sensible to you?	15:48
Shrews	as far as organizing the tox tests	15:48
Shrews	tl;dr.... current tests -> tests/unit/ , driver func tests -> tests/functional/<driver>/	15:50
Shrews	func tests will be a separate job per driver. didn't make sense to have a single job to set up all of the potential backends on a single node	15:53
clarkb	The first paragraph of https://github.com/projectatomic/bubblewrap#user-namespaces is the important part re privilege vs unprivileged bwrap aiui	15:55
clarkb	I think if your container provider considered user namespaces secure for unprivileged users then bwrap would run fine	15:55
clarkb	for example on my tumbleweed machine where non root without setuid can run brwap	15:55
corvus	Shrews: that looks great	15:57
*** themroc has quit IRC		16:00
*** rfolco\|rucker is now known as rfolco\|brb		16:01
tobiash	goern: btw, nodepool builder also needs privs	16:01
openstackgerrit	Merged openstack-infra/zuul master: web: add config-errors notifications drawer https://review.openstack.org/597147	16:07
Shrews	Has anyone else noticed that zk in our tests seems to be getting less and less reliable?	16:39
Shrews	at least in nodepool. not sure about zuul	16:40
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: DNM: testing zookeeper oddities https://review.openstack.org/612750	16:46
clarkb	Shrews: ya on theory I had was maybe running 8 test processes didn't give zookeeper time to get the cpu	16:47
clarkb	not easy to test that, but did push up a change that runs numcpu - 1 test processes	16:47
*** caphrim007 has joined #zuul		16:48
Shrews	clarkb: well, the case I just looked at, zookeeper connection was established, but then it was suspended for whatever reason	16:48
Shrews	maybe still cpu issue? dunno	16:49
clarkb	Could that happen if your connection timesout?	16:49
clarkb	I'm not sure how zk handles a timed out connection when you run without responding for too many ticks	16:49
Shrews	maybe. but it was less than 2 seconds between connection and suspension	16:51
Shrews	http://logs.openstack.org/98/605898/2/gate/tox-py36/9d5cf96/testr_results.html.gz	16:51
Shrews	for reference	16:51
corvus	zuul tests have been having issues too, which also are consistent with a cpu contention hypothesis	16:52
Shrews	clarkb: where was your change you referenced?	16:52
clarkb	https://review.openstack.org/#/c/561037/	16:52
Shrews	does that still work with stestr?	16:53
clarkb	it should, but likely have to move the config to the stestr config file. /me reads some docs	16:53
tobiash	Shrews, corvus: do you think it's cpu or io contention?	16:54
Shrews	tobiash: we don't know what it is, thus our speculative poking :)	16:55
clarkb	https://stestr.readthedocs.io/en/latest/MANUAL.html#user-config-files it may not dynamically evaluate it as shell though	16:55
clarkb	it is configurable but only to an integer value not a run this command set the value value	16:56
Shrews	we could also do the command line option i suppose	16:57
tobiash	I'm probably a little bit biased towards io since I'm mostly dealing with io bottlenecks here	16:57
clarkb	check the dstat addition to zuul jobs change?	16:58
clarkb	fwiw it did look like there was disk io contention on ovh but not the other clouds	16:58
Shrews	hrm, this was an ovh node	16:59
clarkb	I think ovh limits by iops	16:59
clarkb	so lots of small writes (probably what zk is doing actually) are "slow" but big writes are fast	16:59
Shrews	but a similar failure on rax	17:00
caphrim007	are there any zuul services which dont require the zuul.conf?	17:02
tobiash	caphrim007: no, but they require different parts of the zuul conf	17:03
caphrim007	tobiash: thanks!	17:04
*** gothicmindfood has quit IRC		17:05
tobiash	Shrews, corvus, clarkb: the dstat of zuul looks more iops bound than cpu bound	17:07
corvus	caphrim007: (it's designed so you can use the same conf file everywhere)	17:07
tobiash	http://logs.openstack.org/00/610100/3/check/tox-py36/6a8a11d/dstat.html.gz	17:08
caphrim007	corvus: roger that	17:08
clarkb	tobiash: at least for that particular job I don't think it is, if you scroll the window to the left you'll see there is a spike in iops	17:09
clarkb	tobiash: and then the rest of it runs under that spike	17:09
clarkb	implying we don't hit a limit there	17:09
*** gothicmindfood has joined #zuul		17:09
clarkb	at the same time the load average is very high for an 8vcpu host	17:10
clarkb	maybe its both things!	17:10
tobiash	clarkb: but cpu is below 100% constantly	17:11
clarkb	ya	17:11
Shrews	if it isn't cpu, we'll have to look at setting up zookeeper with a tmpfs for our tests. but that's the harder thing to do, so let's try the cpu thing first	17:11
clarkb	another thing I notice is that there are a lot of sockets open. Is it possible we are running into ulimit errors?	17:11
tobiash	clarkb: and the load is a combi ation of cpu and io on linux	17:11
tobiash	That also might be a problem	17:12
clarkb	tobiash: the wai cpu time should indicate waiting on io (or other syscalls) right?	17:12
tobiash	Not neccessarily	17:12
clarkb	I guess if you are running async it wouldn't show up there?	17:12
clarkb	beacuse you are polling	17:12
corvus	it should as long as there is some idle time (and there is), so it should be fairly reliable indication of iowait	17:13
corvus	(if there's no idle time, you can't rely on iowait)	17:13
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: DNM: testing zookeeper oddities https://review.openstack.org/612750	17:13
clarkb	trying to accomodate the number of sockets/fds is probably worth doing anyway to avoid ulimit problems on say your laptop	17:14
tobiash	But nevertheless it's very io heavy as it's doing constantly brtween 500 and 1000 iops	17:16
*** rfolco\|brb is now known as rfolco\|rucker		17:16
clarkb	yes	17:16
Shrews	those graphs would be handy to have for nodepool	17:17
Shrews	how is that enabled?	17:19
clarkb	I think its an unmerged change to the zuul tox job to add it?	17:19
tobiash	Shrews: https://review.openstack.org/#/c/610100	17:19
clarkb	fwiw running zk on a tmpfs may not be too difficult since we use the tools/test-setup.sh method of running zk	17:19
clarkb	I'll work on making that happen	17:20
Shrews	clarkb: that isn't used in nodepool	17:21
Shrews	we should actually remove that	17:21
tobiash	Maybe in a pre playbook like dstat?	17:21
Shrews	oh, np doesn't have that	17:21
clarkb	Shrews: remote the test setup? that is how mysql is configured	17:22
clarkb	Shrews: and that is why nodepool probably doesn't have it	17:22
clarkb	I don't think you can remove it from zuul if you want to test the mysql reporting driver	17:22
Shrews	clarkb: i was thinking of only nodepool	17:22
corvus	do folks like the dstat thing? should we start thinking about how to do it for realz?	17:22
Shrews	but looking in zuul code (duh)	17:23
Shrews	corvus: seems handy for this use case, at least	17:23
tobiash	corvus: we have something like this in the base job that cam be enabled by a job var	17:25
corvus	if we want the data for nodepool, the most expiditious thing would just be to copy that change to the nodepool repo for now... i can do that real quick	17:26
corvus	tobiash: what do you use to generate the report?	17:26
tobiash	We use sar for gathering and 'sadf -g' for generati g svg graphs	17:26
corvus	i really like the thing in my change because it's all static js that just gets concatenated into a single file. it's very simple and self-contained. however, it isn't published anywhere, so we'd have to figure out a distribution mechanism	17:27
corvus	tobiash: is that something you could share?	17:27
tobiash	Of course	17:27
Shrews	corvus: thx. i can rebase on top of that change	17:27
tobiash	corvus: I can share that later (not at laptop atm)	17:29
openstackgerrit	James E. Blair proposed openstack-infra/nodepool master: WIP: Run dstat and generate graphs in unit tests https://review.openstack.org/612765	17:30
corvus	Shrews: ^	17:30
corvus	tobiash: cool, thanks	17:30
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: DNM: testing zookeeper oddities https://review.openstack.org/612750	17:31
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Run zookeeper datadir on tmpfs during testing https://review.openstack.org/612766	17:33
clarkb	it isn't too bad to use a tmpfs ^ if we want to go that route	17:33
clarkb	could also look into eatmydata and other write() fast returners	17:33
*** jpena is now known as jpena\|off		17:34
clarkb	Shrews: if ^ resutls in more stability we could do similar for nodepool	17:34
Shrews	not quite as simple in nodepool, but yeah. hope it produces good results	17:35
corvus	Shrews: what sets up zk in the unit tests in nodepool?	17:36
Shrews	corvus: nothing. it's installed from bindep	17:36
Shrews	so we assume it's running	17:37
corvus	Shrews: ah gotcha	17:37
corvus	Shrews: i think that's the same for zuul, so it might just be a matter of adding a test-setup.sh script with the sed in it	17:38
Shrews	corvus: yeah, that's what i was thinking	17:38
Shrews	corvus: where is that script called in zuul?	17:39
clarkb	it is part of the base tox jobs iirc	17:39
corvus	ya	17:39
corvus	so if it exists, it'll get run in tox-pyxx automatically	17:39
Shrews	oh, that's convenient	17:40
Shrews	so not so bad then	17:40
clarkb	we might want to add noatime to that set of mount options too	17:40
Shrews	clarkb: ++	17:40
clarkb	lets see if that actually works (I'm slightly worried about file permissions but the mount that happened locally for me seemed to just work	17:42
Shrews	ok, i've got an equivalent change for nodepool, but i'm not going to push it up just ye	17:46
Shrews	yet	17:46
*** nilashishc has quit IRC		17:59
SpamapS	Hey everyone, I'm starting to poke at converting my kubernetes yamls for zuul+nodepool (based on tobiash's openshift submission) into helm charts. Just wondering if anybody has headed down that route before I go there.	18:01
*** chandankumar is now known as chkumar\|off		18:37
*** panda has quit IRC		18:45
*** panda has joined #zuul		18:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Merger: automatically add new hosts to the known_hosts file https://review.openstack.org/608453	18:57
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Merger: automatically add new hosts to the known_hosts file https://review.openstack.org/608453	18:59
*** jtanner has joined #zuul		19:19
tobiash	corvus: do you still use xenail nodes in openstack?	19:22
clarkb	tobiash: yes we run zuul on xenial and many of the tests still run on xenial	19:23
*** pcaruana has quit IRC		19:23
tobiash	corvus: I'm asking because the svg generation of sadf needs sysstat at least in version 11.4.0 which is too old in xenial (I think 16.10 was the first that shipped sysstat in a version that can generate svg)	19:24
tobiash	so we're currently running the svg generation in docker in an alpine image ;)	19:24
tobiash	so to use that in openstack we'd maybe to host the sadf binary somewhere (or add it to the xenial nodes in a newer version)	19:25
tobiash	or use a dfferent method of svg generation	19:25
corvus	tobiash: i think it's also ok if it only works on newer systems	19:26
tobiash	ok, that makes it easier	19:26
*** openstackgerrit has quit IRC		20:06
*** caphrim007 has quit IRC		20:30
SpamapS	corvus: how would you feel about adding something to zuul that lets it read any config option from envvars (constructed from section+key)? Asking because that makes kubernetes deployments of zuul a lot simpler.	20:32
SpamapS	I had to jump through a lot of hoops to get secrets into the containers ... porting that to helm chart is shining a light on how this could be made quite a bit simpler.	20:32
corvus	SpamapS: to be honest, that sounds terrible -- like, we must be missing something. surely kubernets can handle running apps with config files ?	20:33
SpamapS	Nope.	20:36
SpamapS	It can handle config file	20:36
*** ssbarnea_ has quit IRC		20:36
dmsimard	SpamapS: you can set environment for Ansible tasks, you don't need Zuul for that	20:36
SpamapS	But when you have some bits coming from config maps, some from secrets, and others from other deployment pieces, you have to assemble that config file in a very frustrating way.	20:36
SpamapS	dmsimard: I am setting things like the github app secret.	20:37
SpamapS	Ansible isn't even in the picture yet	20:37
SpamapS	Or the mysql db password.	20:37
SpamapS	Now	20:37
corvus	(to elaborate on why it rubs me the wrong way -- i feel like we're finally getting to the point where we can somewhat concisely instruct people on how to set up zuul, and forking that process so there are two completely different ways of configuring zuul is counter-productive. being able to talk with people about "the zuul config file" and not have that be a mystery depending on the deployment tech would be	20:37
corvus	great)	20:37
SpamapS	The problem is that in order to build that file, you have to pull things from many different sources.	20:38
SpamapS	We can also just make a wrapper that does what I described.	20:38
SpamapS	Or, we can enable ConfigParser environment interpolation.	20:38
SpamapS	Which I discovered after asking that question.. just now. ;)	20:38
dmsimard	SpamapS: this would not work ? https://gist.github.com/dmsimard/7e8753b252de7cc9380c2b4d5ad2f6f9	20:39
SpamapS	http://paste.openstack.org/show/732855/ <-- this patch actually makes it so you can reference the environment in zuul.conf	20:39
SpamapS	with %(ENV_VAR_NAME)s	20:39
SpamapS	so maybe that's a happy medium?	20:40
SpamapS	since you still have "a zuul config file"	20:40
corvus	SpamapS: i see the problem you describe; that solution soulds like maybe a good compromise	20:40
SpamapS	but you can feed variable things in via the environment	20:40
corvus	yeah, it's a bit more explicit and less magic -- it should be pretty easy to understand/debug	20:40
SpamapS	Indeed, and doesn't change by way of deployment tool.	20:41
corvus	++	20:41
dmsimard	SpamapS: oh, it's not for a job, it's for actually deploying zuul ?	20:41
SpamapS	reading up on the caveats now	20:41
SpamapS	since I literally just learned about this 120 seconds ago.	20:41
SpamapS	dmsimard: correct	20:41
dmsimard	ok, my bad, I had the job use case in mind	20:41
SpamapS	Yeah, and I'm explicitly avoiding ansible for any of it just to avoid ansibleception.	20:42
dmsimard	fair :)	20:42
dmsimard	so you're using puppet? jk	20:42
SpamapS	(though it looks like ansible+k8s should be a lot simpler in ansible 2.7)	20:42
dmsimard	yeah.. the awx installer uses ansible to deploy itself in k8s/openshift	20:43
clarkb	SpamapS: fwiw https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#add-configmap-data-to-a-volume implies you could write multiple config files	20:43
clarkb	the value is the file content an the key the filename at that volume mount point	20:43
SpamapS	clarkb: configmaps and secrets cannot land in the same dir, nor can they be symlinked reliably.	20:46
SpamapS	which is exactly the frustration I landed on	20:46
SpamapS	zuul.conf ended up having to be a secret entirely.	20:46
clarkb	how are you suopposed to use secrets with config	20:46
clarkb	thats seems broken	20:46
SpamapS	YES	20:46
SpamapS	secrets are carefully handled in such a way that they are very hard to compromise accidentally and don't end up on disk ever.	20:47
SpamapS	but that makes them a bit rigid.	20:47
SpamapS	I believe you can probably figure out a way to make something like zuul.conf in one dir, and zuul-secure.conf in the same dir.	20:48
dmsimard	secrets are also very securely "encrypted" in base64 in etcd :(	20:48
SpamapS	but.. I'm just thinking, I kind of like the idea of just sticking them in the environment.	20:48
clarkb	dmsimard: and etcd doesn't have read acls (or did v3 add that?)	20:48
dmsimard	not sure, openshift has acls for sure but I'm not sure if that comes from etcd or k8s	20:48
clarkb	SpamapS: ya if that works and configparser supports it sanely seems like a reaosnable approach	20:49
SpamapS	etcd3 does in fact have RBAC	20:49
clarkb	nice	20:49
SpamapS	but IIRC the recommendation is to replace that secret storage with something better	20:50
SpamapS	https://kubernetes.io/docs/concepts/configuration/secret/#protections	20:53
SpamapS	FYI, if interested	20:53
*** openstackgerrit has joined #zuul		20:54
openstackgerrit	Merged openstack-infra/zuul master: Run zookeeper datadir on tmpfs during testing https://review.openstack.org/612766	20:54
clarkb	I guess ^ worked thats neat	20:54
SpamapS	and looks like they have a path to encrypted-at-rest secrets	20:54
SpamapS	https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/	20:55
*** manjeets has joined #zuul		21:01
SpamapS	hm.. so with a few more lines of code, instead of doing %(ENV_VAR)s, we could instead have $ENV_VAR and ${ENV_VAR} work..	21:01
SpamapS	the latter would be less surprising. Like, people might just try that without reading the docs.	21:02
*** caphrim00_ has joined #zuul		21:03
*** caphrim00_ has quit IRC		21:07
*** caphrim007 has joined #zuul		21:08
caphrim007	hey folks, can i ask nodepool questions here? or is that another channel?	21:09
corvus	SpamapS: that could be a thing	21:09
corvus	caphrim007: this is the place	21:09
caphrim007	corvus: thanks. had a question. if I have a min-ready of 1 in my nodepool.conf and an existing node created in my openstack. i manually deleted the node in openstack, and expected that nodepool would rectify that. was that an incorrect assumption? didnt seem to "work" until i did nodepool delete foo	21:10
caphrim007	the existing node had been created via nodepool btw	21:11
corvus	caphrim007: ah nope, i don't think nodepool will detect that (it doesn't expect nodes to be deleted from under it)	21:12
caphrim007	ahh ok. thanks for the clarification!	21:12
corvus	caphrim007: but nodepool delete will delete the node in nodepool and openstack	21:12
clarkb	it should eventually cleanup the node from its db (and in the cloud if necessary) after the ready timeout is reached though	21:12
clarkb	but that is a long timeout by default	21:12
corvus	clarkb: oh, right, that's true. but that's like 8+ hours i think	21:13
clarkb	(8hours?)	21:13
corvus	it's more likely to just try to use the node and fail before then	21:13
caphrim007	ok. that brings me to another question actually corvus. how does zuul instruct nodepool to create something? does it use an api for that? because i see no thing like "nodepool create foo"	21:13
corvus	caphrim007: yes, zuul puts requests into zookeeper and nodepool handles them. there isn't an external interface to this api yet (probably some day, but we still consider it a private api for now while we iron it out)	21:14
corvus	caphrim007: you can inspect it with 'nodepool request-list'	21:14
corvus	that will show any pending requests from zuul	21:15
caphrim007	oh, interesting	21:15
manjeets	Hello zuul community I'm trying to set up a 3rd party job for an opensource project one of openstack projects, and found this https://zuul-ci.org/docs/zuul/admin/quick-start.html#quick-start	21:17
manjeets	I can disable gerrit service from this and configure it to point to gerrit for opensource project ?	21:17
corvus	caphrim007: here's the current output from openstack's nodepool if you are curious: http://paste.openstack.org/show/732858/	21:18
corvus	manjeets: yes, you should be able to do that. the zuul.conf file is bind-mounted into the container, so you can just edit it in your local directory. you can remove the gerrit container from the docker-compose file to avoid running it.	21:20
corvus	manjeets: note that it will levae "localhost" links for the logs, so you will also need to change the log url to something that other people can access.	21:21
manjeets	corvus thanks, should i just remove gerrit from docker compose or I have to delete gerrit-config tag as well, If I understood correctly i'll link gerrit stream of upstream patches in zuul.conf ?	21:22
corvus	caphrim007: i forgot (until i pasted that output) that even min-ready nodes go through the request system, so you should be able to see that too.	21:22
caphrim007	corvus: alrighty. thanks!	21:23
corvus	manjeets: you can remove gerrit-config as well	21:23
caphrim007	corvus: is main.yaml used by all the zuul components too? or only select ones?	21:23
corvus	manjeets: and yes, you can update zuul.conf to connect to an upstream gerrit instead of the local one.	21:23
corvus	caphrim007: it's only used by the scheduler to bootstrap the rest of the configuration	21:24
caphrim007	kk	21:24
caphrim007	ahh yeah. there it is in zuul.conf. duh tim	21:24
manjeets	corvus thanks I'm building that right now might come back to ask about issue if I run into any !	21:24
corvus	manjeets: great, we're happy to help -- when you're done, maybe you can share your configuration -- other folks may be able to use it :)	21:25
manjeets	corvus sure ! once I'm done I'll write documentation and make it public	21:26
caphrim007	corvus: is there a reference/code/anything that i can look at that covers the zuul.conf options?	21:29
corvus	caphrim007: yes, i think they should all be covered on this page: https://zuul-ci.org/docs/zuul/admin/components.html	21:33
corvus	caphrim007: (under each individual section)	21:34
caphrim007	corvus: ahh ok. is this config here, zuul_url, no longer a thing? https://github.com/openstack/windmill/blob/306602fc0c267837e2a4af68e510e1e7b705871b/config/zuul/zuul.conf.j2#L42	21:37
corvus	caphrim007: correct -- i think that was for zuul v2	21:38
caphrim007	k	21:38
*** spsurya has quit IRC		21:38
corvus	caphrim007: oh, there's also a little more zuul.conf option documentation in the drivers pages: https://zuul-ci.org/docs/zuul/admin/connections.html#drivers	21:38
corvus	caphrim007: for example, to configure a github connection, here are the docs: https://zuul-ci.org/docs/zuul/admin/drivers/github.html#connection-configuration	21:39
corvus	(those are in separate files -- one per driver -- since the drivers are supposed to be self-contained)	21:39
caphrim007	ahh yes, right right. i'm just doing some rectifying between windmill and what i see in the current zuul docs	21:40
clarkb	ianw: if you have a sec can you review https://review.openstack.org/#/c/609829/5 ? it adds the port cleanups to nodepool itself	21:45
openstackgerrit	Clark Boylan proposed openstack-infra/nodepool master: Run test zookeeper on top of tmpfs https://review.openstack.org/612816	21:49
clarkb	Shrews: corvus ^ didn't see one of those get pushed yet so went ahead and did it atop the dstat change to see if we notice a difference	21:49
clarkb	looking at http://logs.openstack.org/16/612816/1/check/tox-py36/f346739/dstat.html (tmpfs) vs http://logs.openstack.org/65/612765/1/check/tox-py36/d2d81c3/dstat.html (not tmpfs) we do drastically reduce the iops	21:58
clarkb	whether or not that has a hand in making the tmpfs change pass vs the failure in the not tmpfs run hard to say	21:58
clarkb	the dstat info for the not tmpfs run doesn't actually look all the bad	21:58
clarkb	load is low, plenty of cpu idle time, low memory usage etc	22:00
ianw	clarkb: i will take a closer look today. i was wondering if this is actually something nodepool should work around, or if it was a very specific problem	22:06
clarkb	ianw: we've seen it on other clouds too (like packethost recently, but also hpcloud way back when iirc)	22:07
clarkb	I expect it will be a useful thing to have nodepool understand :/	22:07
ianw	i guess "this shouldn't happen but does" is the raison d'être of openstacksdk, and nodepool to some extent	22:08
clarkb	http://logs.openstack.org/16/612816/1/check/tox-py35/70a3157/job-output.txt.gz I think that rules out iowait as the cause of the zk problems in nodepool test suite	22:09
clarkb	I wonder what our ulimit is there	22:09
*** caphrim007 has quit IRC		22:21
clarkb	reading the failed test logs again the issue is in wait_for_threads	22:32
clarkb	we actually do build the image and boot a node that we are waiting on but there must be some unexpected background thread running that holds us up	22:33
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add the process environment to zuul.conf parser https://review.openstack.org/612824	22:47
clarkb	ok I've tried to reproduce that locally and am having a very hard time. Seems to be fine from here	22:56
clarkb	ran that specific test ~30 times and now running the full test suite to see if it is an interaction between test threads	22:57
*** threestrands has joined #zuul		23:02
openstackgerrit	Clark Boylan proposed openstack-infra/nodepool master: Do not merge https://review.openstack.org/612828	23:08
clarkb	I really dislike ^ but unsure of whereelse to look since local reproduction isn't working	23:08
*** rlandy is now known as rlandy\|bbl		23:17
clarkb	at least it caught one	23:24
clarkb	ok I think the real-cloud thread is leaking across tests	23:26
clarkb	I'm going to guess this is a side effect of the openstacksdk release that happened	23:29
clarkb	because openstacksdk is going to run a thread for the api request throttling?	23:29
ianw	clarkb: so you're saying the new thread hasn't been skipped in the wait?	23:51
clarkb	ianw: yes, though reading our task manager and nodepool fixtures I expect that this thread should be stopped	23:56
clarkb	ianw: now that I have that info I'm tracking to hack togther a reproduction locally by running the sdk integration test before the webapp test	23:57
clarkb	currently trying to figure out how to enforce test order	23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!